[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [STAGING]: Block migration segfaults
From: |
Jan Kiszka |
Subject: |
[Qemu-devel] Re: [STAGING]: Block migration segfaults |
Date: |
Thu, 03 Dec 2009 20:10:19 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 |
Luiz Capitulino wrote:
> On Thu, 03 Dec 2009 19:36:15 +0100
> Jan Kiszka <address@hidden> wrote:
>
>> Luiz Capitulino wrote:
>>> Hi there,
>>>
>>> Got this while testing block migration in staging:
>>>
>>> """
>>> Program terminated with signal 11, Segmentation fault.
>>> #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start
>>> full migration for %s\n",
>>> ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
>>> 192 if (mon->mc && !mon->mc->print_enabled) {
>>> """
>>>
>>> The problem here is that init_blk_migration() calls monitor_printf() with
>>> a NULL 'mon' and the backtrace shows that this is true for the entire call
>>> chain.
>> What is the backtrace? And how did you start the migration?
>
> Started the source VM with:
>
> # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
> -balloon virtio -m 1G -S
>
> and the destination one with:
>
> # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
> -balloon virtio -m 1G -S -incoming tcp:0:4444
>
> Migration command issued:
>
> (QEMU) migrate -d -b tcp:0:4444
>
Ah, forgot '-d'! Yes, that was precisely the use case for suppressing
monitor output I was talking about. This depends on the monitor services
catching NULL properly, so please push the corresponding patch.
>
> I have no idea if this correct and wondered if specifying the same image
> for the destination VM would be catastrophic. :)
>
> The backtrace follows and it's the source VM which segfaults:
>
> """
> #0 0x0000000000410c11 in monitor_vprintf (mon=0x0, fmt=0x5ada87 "Start full
> migration for %s\n",
> #1 0x0000000000410d59 in monitor_printf (mon=0x0, fmt=0x5ada87 "Start full
> migration for %s\n")
> #2 0x00000000004e584b in init_blk_migration (mon=0x0, f=0x2864130) at
> block-migration.c:254
> #3 0x00000000004e5d8a in block_save_live (mon=0x0, f=0x2864130, stage=1,
> opaque=0xbea960)
> #4 0x00000000004db61a in qemu_savevm_state_begin (mon=0x0, f=0x2864130,
> blk_enable=1, shared=0)
> #5 0x00000000004d2448 in migrate_fd_connect (s=0x20bf470) at migration.c:279
> #6 0x00000000004d2849 in tcp_wait_for_connect (opaque=0x20bf470) at
> migration-tcp.c:72
> #7 0x000000000040c3f8 in main_loop_wait (timeout=5000) at
> /home/lcapitulino/src/aliguori-queue/vl.c:3875
> #8 0x000000000040ca17 in main_loop () at
> /home/lcapitulino/src/aliguori-queue/vl.c:4095
> #9 0x00000000004104ce in main (argc=10, argv=0x7fff6f838bf8,
> envp=0x7fff6f838c50)
> """
>
> [...]
>
>>> A patch from me (4a29a in staging) changes a higher level monitor
>>> function to touch 'mon' before passing it down and here's the segfault.
>>>
>>> Now, the point is: I could give the old behavior back but I think we're
>>> hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'?
>> If there is no monitor associated with the current context, it can be
>> NULL. This is mostly the case during early startup.
>
> Why would one call monitor_printf() before monitor initialization?
Via shared code e.g.
>
>> One may also use
>> this to suppress output (though I don't recall any real case ATM).
>
> I would prefer functions like monitor_disable()/monitor_enable()
> for this...
They have different meanings.
>
>>> Anyways, the following patch adds the old behavior back just in case
>>> you want to see it working...
>> Yes, better restore the check. Still, your call stack would be
>> interesting. Maybe there is actual another bug behind it.
>
> Ok, I'll send the fix because it's not that important right now,
> but I'm not convinced this is the right thing to do.
Thanks,
Jan
signature.asc
Description: OpenPGP digital signature