qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] hitting intermittent issue with live migration from qem


From: Chris Friesen
Subject: Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
Date: Tue, 4 Apr 2017 08:28:51 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 04/04/2017 07:56 AM, Ladi Prosek wrote:
On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi <address@hidden> wrote:
On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote:

Initially we have a bunch of guests running on compute-2 (which is running
qemu-kvm-ev 2.3.0).  We then started live-migrating them one at a time to
compute-0 (which is running qemu-kvm-ev 2.6.0).  Three of them migrated
successfully.  The fourth (which was essentially identical in configuration
to the first three) failed, as per the following logs in
/var/log/libvirt/qemu/instance-0000000e.log:


2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
- used_idx 0x47c
2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
0x0 of device '0000:00:07.0/virtio-balloon'
2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
not permitted
2017-03-29 06:38:37.896+0000: shutting down


Does anyone know of an existing bug report covering this issue?  (I took a
look and didn't see anything obviously related.)

This is the virtio-balloon device.  If you remove the device the live
migration should work reliably.

Alternatively, you can temporarily rmmod virtio_balloon inside the guest
for live migration.  After migration you can modprobe virtio_balloon
again.

last_avail_idx 0x47b with used_idx 0x47c is an invalid device state.
I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against
qemu.git/master and do not see an obvious bug.  I also compared
qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.

The device likely got into the invalid state as part of a previous
migration to an unfixed QEMU. I second Stefan's suggestion to
temporarily remove the device or unload the driver.

I'll give that a try (been busy with a separate issue).

If I have a guest already running, can I unilaterally hot-remove the device from the host side or does the guest need to be involved as well? (I'm just trying to figure out how to deal with existing guests.)

Thanks,
Chris



reply via email to

[Prev in Thread] Current Thread [Next in Thread]