[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2.5 v5 0/11] dataplane snapshot fixes
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [PATCH 2.5 v5 0/11] dataplane snapshot fixes |
Date: |
Fri, 6 Nov 2015 17:29:59 +0000 |
On Fri, Nov 6, 2015 at 4:19 PM, Denis V. Lunev <address@hidden> wrote:
> On 11/06/2015 07:05 PM, Eric Blake wrote:
>>
>> On 11/06/2015 08:54 AM, Stefan Hajnoczi wrote:
>>>
>>> On Wed, Nov 04, 2015 at 08:19:31PM +0300, Denis V. Lunev wrote:
>>>>
>>>> with test
>>>> while /bin/true ; do
>>>> virsh snapshot-create rhel7
>>>> sleep 10
>>>> virsh snapshot-delete rhel7 --current
>>>> done
>>>> with enabled iothreads on a running VM leads to a lot of troubles:
>>>> hangs,
>>>> asserts, errors.
>>
>> That is a case of using libvirt to trigger internal snapshots...
>>
>>> The HMP monitor is legacy and also not used by modern libvirt.
>>
>> ...and libvirt is forced to use HMP for internal snapshots, since we
>> _still_ haven't exposed internal snapshots as a QMP command.
>>
>>> I think the affected use cases are restricted to savevm+dataplane and
>>> HMP+dataplane.
>>
>> The fact that the commit message calls out a libvirt method of
>> triggering the bug does mean that it is user-visible, and so it would
>> qualify as a bug fix even during hard freeze. But I also understand
>> that taking a large complex series late in the game is not without risk;
>> and it is not like this is a regression (rather, something that has
>> never worked bulletproof), right?
>>
> yes, this was not working in the past and this is not a regression.
>
> The problem is that it seems that NOBODY uses iothreads in the
> production or even for complex real life production tests. There
> is another recently merged example of this (100% reproducible,
> happens both on migration/snapshot). We have faced this on
> suspend operation.
>
> commit 10a06fd65f667a972848ebbbcac11bdba931b544
> Author: Pavel Butsykin <address@hidden>
> Date: Mon Oct 26 14:42:57 2015 +0300
>
> virtio: sync the dataplane vring state to the virtqueue before
> virtio_save
>
> I have started this initially as a set of small bits in savevm code
> and was asked to move the code from savevm.c to block layer.
> This has been done and yes, series becomes complex after
> that and it was obvious that it will be complex when the task
> was set to move a bunch of code from one place to another.
>
> Anyway, from my point of view the serie is not that complex.
> It is just large and is doing simple things almost near copy/paste
> and there is a month to catch bugs here.
>
> Can we still consider this for merge?
Absolutely, they are still bugs and we can fix them for 2.5.
I just wanted to reflect on the scope of the bugs and it occurred to
me that these code paths haven't been exercised/tested as often.
Stefan
- Re: [Qemu-devel] [PATCH 08/11] migration: implement bdrv_all_find_vmstate_bs and bdrv_unlock helpers, (continued)
- [Qemu-devel] [PATCH 09/11] migration: add missed aio_context_acquire for state writing/reading, Denis V. Lunev, 2015/11/04
- [Qemu-devel] [PATCH 06/11] migration: drop find_vmstate_bs check in hmp_delvm, Denis V. Lunev, 2015/11/04
- [Qemu-devel] [PATCH 11/11] monitor: add missed aio_context_acquire into vm_completion call, Denis V. Lunev, 2015/11/04
- [Qemu-devel] [PATCH 10/11] snapshot: create bdrv_all_create_snapshot helper, Denis V. Lunev, 2015/11/04
- Re: [Qemu-devel] [PATCH 2.5 v5 0/11] dataplane snapshot fixes, Stefan Hajnoczi, 2015/11/06