qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unl


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed
Date: Wed, 17 Sep 2014 16:46:34 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 09/16/2014 10:34 PM, Kevin Wolf wrote:
> Am 16.09.2014 um 14:10 hat Paolo Bonzini geschrieben:
>> Il 16/09/2014 14:02, Alexey Kardashevskiy ha scritto:
>>> I am having problems when migrate a guest via libvirt like this:
>>>
>>> virsh migrate --live --persistent --undefinesource --copy-storage-all
>>> --verbose --desturi qemu+ssh://legkvm/system --domain chig1
>>>
>>> The XML used to create the guest is at the end of this mail.
>>>
>>> I see NBD FLUSH command after the destination QEMU received EOF for
>>> migration stream and this produces a crash in qcow2_co_flush_to_os() as
>>> s->lock is false or s->l2_table_cache is NULL.
>>>
>>
>> Max, Kevin, could the fix be something like this?
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 0daf25c..e7459ea 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1442,6 +1442,7 @@ static void qcow2_invalidate_cache(BlockDriverState 
>> *bs, Error **errp)
>>          memcpy(&aes_decrypt_key, &s->aes_decrypt_key, 
>> sizeof(aes_decrypt_key));
>>      }
>>  
>> +    qemu_co_mutex_lock(&s->lock);
>>      qcow2_close(bs);
>>  
>>      bdrv_invalidate_cache(bs->file, &local_err);
>> @@ -1455,6 +1456,7 @@ static void qcow2_invalidate_cache(BlockDriverState 
>> *bs, Error **errp)
>>  
>>      ret = qcow2_open(bs, options, flags, &local_err);
>>      QDECREF(options);
>> +    qemu_co_mutex_unlock(&s->lock);
>>      if (local_err) {
>>          error_setg(errp, "Could not reopen qcow2 layer: %s",
>>                     error_get_pretty(local_err));
>>
>> On top of this, *_invalidate_cache needs to be marked as coroutine_fn.
> 
> I think bdrv_invalidate_cache() really needs to call bdrv_drain_all()
> before starting to reopen stuff. There could be requests in flight
> without holding the lock and if you can indeed reopen their BDS under
> their feet without breaking things (I doubt it), that would be pure
> luck.


I tried the patch below and it did not help. So I assume I did it wrong,
could you please explain more? Thanks!


diff --git a/block.c b/block.c
index 2df600e..ecc876d 100644
--- a/block.c
+++ b/block.c
@@ -5038,11 +5038,16 @@ void bdrv_invalidate_cache(BlockDriverState *bs,
Error **errp)
         return;
     }

+    bdrv_drain_all();
+
     if (bs->drv->bdrv_invalidate_cache) {
         bs->drv->bdrv_invalidate_cache(bs, &local_err);
     } else if (bs->file) {
         bdrv_invalidate_cache(bs->file, &local_err);
     }
+
+    bdrv_drain_all();
+
     if (local_err) {
         error_propagate(errp, local_err);
         return;



-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]