[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's mem
From: |
Wen Congyang |
Subject: |
Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration |
Date: |
Wed, 25 Mar 2015 18:21:17 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 |
On 03/25/2015 05:50 PM, Juan Quintela wrote:
> zhanghailiang <address@hidden> wrote:
>> Hi all,
>>
>> We found that, sometimes, the content of VM's memory is inconsistent between
>> Source side and Destination side
>> when we check it just after finishing migration but before VM continue to
>> Run.
>>
>> We use a patch like bellow to find this issue, you can find it from affix,
>> and Steps to reprduce:
>>
>> (1) Compile QEMU:
>> ./configure --target-list=x86_64-softmmu --extra-ldflags="-lssl" && make
>>
>> (2) Command and output:
>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock
>> -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c
>> -drive
>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>> -device
>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor
>> stdio
>
> Could you try to reproduce:
> - without vhost
> - without virtio-net
> - cache=unsafe is going to give you trouble, but trouble should only
> happen after migration of pages have finished.
I can use e1000 to reproduce this problem.
>
> What kind of load were you having when reproducing this issue?
> Just to confirm, you have been able to reproduce this without COLO
> patches, right?
I can reproduce it without COLO patches. The newest commit is:
commit 054903a832b865eb5432d79b5c9d1e1ff31b58d7
Author: Peter Maydell <address@hidden>
Date: Tue Mar 24 16:34:16 2015 +0000
Update version for v2.3.0-rc1 release
Signed-off-by: Peter Maydell <address@hidden>
>
>> (qemu) migrate tcp:192.168.3.8:3004
>> before saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>> md_host : after saving ram complete
>> ff703f6889ab8701e4e040872d079a28
>>
>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock
>> -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0
>> -boot c -drive
>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
>> -device
>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor
>> stdio -incoming tcp:0:3004
>> (qemu) QEMU_VM_SECTION_END, after loading ram
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after loading all vmstate
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>> md_host : after cpu_synchronize_all_post_init
>> 230e1e68ece9cd4e769630e1bcb5ddfb
>>
>> This happens occasionally, and it is more easy to reproduce when issue
>> migration command during VM's startup time.
>
> OK, a couple of things. Memory don't have to be exactly identical.
> Virtio devices in particular do funny things on "post-load". There
> aren't warantees for that as far as I know, we should end with an
> equivalent device state in memory.
>
>> We have done further test and found that some pages has been dirtied but its
>> corresponding migration_bitmap is not set.
>> We can't figure out which modules of QEMU has missed setting bitmap when
>> dirty page of VM,
>> it is very difficult for us to trace all the actions of dirtying VM's pages.
>
> This seems to point to a bug in one of the devices.
>
>> Actually, the first time we found this problem was in the COLO FT
>> development, and it triggered some strange issues in
>> VM which all pointed to the issue of inconsistent of VM's memory. (We have
>> try to save all memory of VM to slave side every time
>> when do checkpoint in COLO FT, and everything will be OK.)
>>
>> Is it OK for some pages that not transferred to destination when do
>> migration ? Or is it a bug?
>
> Pages transferred should be the same, after device state transmission is
> when things could change.
>
>> This issue has blocked our COLO development... :(
>>
>> Any help will be greatly appreciated!
>
> Later, Juan.
>
- [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Juan Quintela, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration,
Wen Congyang <=
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Wen Congyang, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Li Zhijian, 2015/03/25
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Dr. David Alan Gilbert, 2015/03/27
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/28
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, Dr. David Alan Gilbert, 2015/03/30
- Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration, zhanghailiang, 2015/03/31