[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's mem
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration |
Date: |
Mon, 30 Mar 2015 08:59:35 +0100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
* zhanghailiang (address@hidden) wrote:
> On 2015/3/27 18:18, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (address@hidden) wrote:
> >>On 2015/3/26 11:52, Li Zhijian wrote:
> >>>On 03/26/2015 11:12 AM, Wen Congyang wrote:
> >>>>On 03/25/2015 05:50 PM, Juan Quintela wrote:
> >>>>>zhanghailiang<address@hidden> wrote:
> >>>>>>Hi all,
> >>>>>>
> >>>>>>We found that, sometimes, the content of VM's memory is inconsistent
> >>>>>>between Source side and Destination side
> >>>>>>when we check it just after finishing migration but before VM continue
> >>>>>>to Run.
> >>>>>>
> >>>>>>We use a patch like bellow to find this issue, you can find it from
> >>>>>>affix,
> >>>>>>and Steps to reprduce:
> >>>>>>
> >>>>>>(1) Compile QEMU:
> >>>>>> ./configure --target-list=x86_64-softmmu --extra-ldflags="-lssl" &&
> >>>>>> make
> >>>>>>
> >>>>>>(2) Command and output:
> >>>>>>SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
> >>>>>>qemu64,-kvmclock -netdev tap,id=hn0-device
> >>>>>>virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
> >>>>>>file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
> >>>>>> -device
> >>>>>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
> >>>>>>-monitor stdio
> >>>>>Could you try to reproduce:
> >>>>>- without vhost
> >>>>>- without virtio-net
> >>>>>- cache=unsafe is going to give you trouble, but trouble should only
> >>>>> happen after migration of pages have finished.
> >>>>If I use ide disk, it doesn't happen.
> >>>>Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
> >>>>it is because I migrate the guest when it is booting. The virtio net
> >>>>device is not used in this case.
> >>>Er??????
> >>>it reproduces in my ide disk
> >>>there is no any virtio device, my command line like below
> >>>
> >>>x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net
> >>>none
> >>>-boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
> >>>usb=off -no-user-config -nodefaults -monitor stdio -vga std
> >>>
> >>>it seems easily to reproduce this issue by following steps in _ubuntu_
> >>>guest
> >>>1. in source side, choose memtest in grub
> >>>2. do live migration
> >>>3. exit memtest(type Esc in when memory testing)
> >>>4. wait migration complete
> >>>
> >>
> >>Yes???it is a thorny problem. It is indeed easy to reproduce, just as
> >>your steps in the above.
> >>
> >>This is my test result: (I also test accel=tcg, it can be reproduced also.)
> >>Source side:
> >># x86_64-softmmu/qemu-system-x86_64 -machine
> >>pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu
> >>qemu64,-kvmclock -boot c -drive
> >>file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device
> >>cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
> >>(qemu) ACPI_BUILD: init ACPI tables
> >>ACPI_BUILD: init ACPI tables
> >>migrate tcp:9.61.1.8:3004
> >>ACPI_BUILD: init ACPI tables
> >>before cpu_synchronize_all_states
> >>5a8f72d66732cac80d6a0d5713654c0e
> >>md_host : before saving ram complete
> >>5a8f72d66732cac80d6a0d5713654c0e
> >>md_host : after saving ram complete
> >>5a8f72d66732cac80d6a0d5713654c0e
> >>(qemu)
> >>
> >>Destination side:
> >># x86_64-softmmu/qemu-system-x86_64 -machine
> >>pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu
> >>qemu64,-kvmclock -boot c -drive
> >>file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device
> >>cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
> >>-incoming tcp:0:3004
> >>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>d7cb0d8a4bdd1557fb0e78baee50c986
> >>md_host : after loading all vmstate
> >>d7cb0d8a4bdd1557fb0e78baee50c986
> >>md_host : after cpu_synchronize_all_post_init
> >>d7cb0d8a4bdd1557fb0e78baee50c986
> >
> >Hmm, that's not good. I suggest you md5 each of the RAMBlock's individually;
> >to see if it's main RAM that's different or something more subtle like
> >video RAM.
> >
>
> Er, all my previous tests are md5 'pc.ram' block only.
>
> >But then maybe it's easier just to dump the whole of RAM to file
> >and byte compare it (hexdump the two dumps and diff ?)
>
> Hmm, we also used memcmp function to compare every page, but the addresses
> seem to be random.
>
> Besides, in our previous test, we found it seems to be more easy to reproduce
> when migration occurs during VM's start-up or reboot process.
>
> Is there any possible that some devices have special treatment when VM
> start-up
> which may miss setting dirty-bitmap ?
I don't think there should be, but the code paths used during startup are
probably much less tested with migration. I'm sure the startup code
uses different part of device emulation. I do know we have some bugs
filed against migration during windows boot, I'd not considered that it might
be devices not updating the bitmap.
Dave
>
>
> Thanks,
> zhanghailiang
>
>
> >>>>
> >>>>>What kind of load were you having when reproducing this issue?
> >>>>>Just to confirm, you have been able to reproduce this without COLO
> >>>>>patches, right?
> >>>>>
> >>>>>>(qemu) migrate tcp:192.168.3.8:3004
> >>>>>>before saving ram complete
> >>>>>>ff703f6889ab8701e4e040872d079a28
> >>>>>>md_host : after saving ram complete
> >>>>>>ff703f6889ab8701e4e040872d079a28
> >>>>>>
> >>>>>>DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu
> >>>>>>qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device
> >>>>>>virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive
> >>>>>>file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
> >>>>>> -device
> >>>>>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet
> >>>>>>-monitor stdio -incoming tcp:0:3004
> >>>>>>(qemu) QEMU_VM_SECTION_END, after loading ram
> >>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>md_host : after loading all vmstate
> >>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>md_host : after cpu_synchronize_all_post_init
> >>>>>>230e1e68ece9cd4e769630e1bcb5ddfb
> >>>>>>
> >>>>>>This happens occasionally, and it is more easy to reproduce when issue
> >>>>>>migration command during VM's startup time.
> >>>>>OK, a couple of things. Memory don't have to be exactly identical.
> >>>>>Virtio devices in particular do funny things on "post-load". There
> >>>>>aren't warantees for that as far as I know, we should end with an
> >>>>>equivalent device state in memory.
> >>>>>
> >>>>>>We have done further test and found that some pages has been dirtied
> >>>>>>but its corresponding migration_bitmap is not set.
> >>>>>>We can't figure out which modules of QEMU has missed setting bitmap
> >>>>>>when dirty page of VM,
> >>>>>>it is very difficult for us to trace all the actions of dirtying VM's
> >>>>>>pages.
> >>>>>This seems to point to a bug in one of the devices.
> >>>>>
> >>>>>>Actually, the first time we found this problem was in the COLO FT
> >>>>>>development, and it triggered some strange issues in
> >>>>>>VM which all pointed to the issue of inconsistent of VM's memory. (We
> >>>>>>have try to save all memory of VM to slave side every time
> >>>>>>when do checkpoint in COLO FT, and everything will be OK.)
> >>>>>>
> >>>>>>Is it OK for some pages that not transferred to destination when do
> >>>>>>migration ? Or is it a bug?
> >>>>>Pages transferred should be the same, after device state transmission is
> >>>>>when things could change.
> >>>>>
> >>>>>>This issue has blocked our COLO development... :(
> >>>>>>
> >>>>>>Any help will be greatly appreciated!
> >>>>>Later, Juan.
> >>>>>
> >>>>.
> >>>>
> >>>
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / address@hidden / Manchester, UK
> >
> >.
> >
>
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, (continued)
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Juan Quintela, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Wen Congyang, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Li Zhijian, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Dr. David Alan Gilbert, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/28
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration,
Dr. David Alan Gilbert <=
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/31
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Dr. David Alan Gilbert, 2015/03/31
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Juan Quintela, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Juan Quintela, 2015/03/26
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Michael S. Tsirkin, 2015/03/26
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Stefan Hajnoczi, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Wen Congyang, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Stefan Hajnoczi, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Wen Congyang, 2015/03/27