qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
Date: Thu, 12 Oct 2017 11:02:44 +0100
User-agent: Mutt/1.9.0 (2017-09-02)

On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <address@hidden>
> 
> Hi,
>   This set attempts to make a race condition between migration and
> drive-mirror (and other block users) soluble by allowing the migration
> to be paused after the source qemu releases the block devices but
> before the serialisation of the device state.
> 
> The symptom of this failure, as reported by Wangjie, is a:
>    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> 
> and the source qemu dieing; so the problem is pretty nasty.
> This has only been seen on 2.9 onwards, but the theory is that
> prior to 2.9 it might have been happening anyway and we were
> perhaps getting unreported corruptions (lost writes); so this
> really needs fixing.
> 
> This flow came from discussions between Kevin and me, and we can't
> see a way of fixing it without exposing a new state to the management
> layer.
> 
> The flow is now:
> 
> (qemu) migrate_set_capability pause-before-device on

How about 'switchover-cleanup'


> (qemu) migrate -d ...
> (qemu) info migrate
> ...
> Migration status: pause-before-device

and 'switchover'

> ...
> << issue commands to clean up any block jobs>>
> 
> (qemu) migrate_continue pause-before-device
> (qemu) info migrate
> ...
> Migration status: completed
> 
> This set has been _very_ lightly tested just at the normal migration
> code, without the addition of the drive mirror; so this is a first
> cut.  I'd appreciate some feedback from libvirt whether the inteface
> is OK and ideally a hack to test it in a full libvirt setup to see
> if we hit any other issues.
> 
> The precopy flow is:
> active->pause-before-device->completed
> 
> The postcopy flow is:
> active->pause-before-device->postcopy-active->completed
> 
> Although the behaviour with postcopy only gets interesting when
> we add something like Max's active-sync.
> 
> Please argue about the command and state naming.

Argued above :-)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]