qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unl


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed
Date: Wed, 17 Sep 2014 10:06:15 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Sep 16, 2014 at 02:10:39PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 14:02, Alexey Kardashevskiy ha scritto:
> > I am having problems when migrate a guest via libvirt like this:
> > 
> > virsh migrate --live --persistent --undefinesource --copy-storage-all
> > --verbose --desturi qemu+ssh://legkvm/system --domain chig1
> > 
> > The XML used to create the guest is at the end of this mail.
> > 
> > I see NBD FLUSH command after the destination QEMU received EOF for
> > migration stream and this produces a crash in qcow2_co_flush_to_os() as
> > s->lock is false or s->l2_table_cache is NULL.
> > 
> 
> Max, Kevin, could the fix be something like this?
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 0daf25c..e7459ea 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1442,6 +1442,7 @@ static void qcow2_invalidate_cache(BlockDriverState 
> *bs, Error **errp)
>          memcpy(&aes_decrypt_key, &s->aes_decrypt_key, 
> sizeof(aes_decrypt_key));
>      }
>  
> +    qemu_co_mutex_lock(&s->lock);
>      qcow2_close(bs);
>  
>      bdrv_invalidate_cache(bs->file, &local_err);
> @@ -1455,6 +1456,7 @@ static void qcow2_invalidate_cache(BlockDriverState 
> *bs, Error **errp)
>  
>      ret = qcow2_open(bs, options, flags, &local_err);
>      QDECREF(options);
> +    qemu_co_mutex_unlock(&s->lock);
>      if (local_err) {
>          error_setg(errp, "Could not reopen qcow2 layer: %s",
>                     error_get_pretty(local_err));
> 
> On top of this, *_invalidate_cache needs to be marked as coroutine_fn.

I think the fundamental problem here is that the mirror block job on the
source host does not synchronize with live migration.

Remember the mirror block job iterates on the dirty bitmap whenever it
feels like.

There is no guarantee that the mirror block job has quiesced before
migration handover takes place, right?

IMO that's the fundamental problem and trying to protect
qcow2_invalidate_cache() seems wrong.  We must make sure that the mirror
block job on the source has quiesced before we hand over, otherwise the
destination could see an outdated copy of the disk!

Please let me know if I missed something which makes the operation safe
on the source, but I didn't spot any guards.

Stefan

Attachment: pgpRUr2n8MZQT.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]