qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH 11/12] qcow2: Convert qcow2 to use coroutin


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC][PATCH 11/12] qcow2: Convert qcow2 to use coroutines for async I/O
Date: Thu, 27 Jan 2011 10:27:36 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Thunderbird/3.0.10

Am 26.01.2011 18:12, schrieb Anthony Liguori:
> On 01/26/2011 10:38 AM, Avi Kivity wrote:
>> On 01/26/2011 06:28 PM, Anthony Liguori wrote:
>>> On 01/26/2011 10:13 AM, Avi Kivity wrote:
>>>> Serializing against a global mutex has the advantage that it can be 
>>>> treated as a global lock that is decomposed into fine-grained locks.
>>>>
>>>> For example, we can start the code conversion from an explict async 
>>>> model to a threaded sync model by converting the mutex into a 
>>>> shared/exclusive lock.  Operations like read and write take the lock 
>>>> for shared access (and take a fine-grained mutex on the metadata 
>>>> cache entry), while operation like creating a snapshot take the lock 
>>>> for exclusive access.  That doesn't work with freeze/thaw.
>>>
>>> The trouble with this is that you increase the amount of re-entrance 
>>> whereas freeze/thaw doesn't.
>>>
>>> The code from the beginning of the request to where the mutex is 
>>> acquired will be executed for every single request even while 
>>> requests are blocked at the mutex acquisition.
>>
>> It's just a few instructions.
>>
>>>
>>> With freeze/thaw, you freeze the queue and prevent any request from 
>>> starting until you thaw.  You only thaw and return control to allow 
>>> another request to execute when you begin executing an asynchronous 
>>> I/O callback.
>>>
>>
>> What do you actually save?  The longjmp() to the coroutine code, 
>> linking in to the mutex wait queue, and another switch back to the 
>> main coroutine?  Given that we don't expect to block often, it seems 
>> hardly a cost worth optimizing.

And if you cared, you could probably optimize it in the mutex
implementation.

> It's a matter of correctness, not optimization.
> 
> Consider the following example:
> 
> coroutine {
>     l2 = find_l2(offset);
>     // allocates but does not increment max cluster offset
>     l2[l2_offset(offset)] = alloc_cluster();
> 
>     co_mutex_lock(&lock);
>     write_l2(l2);
>     co_mutex_unlock(&lock);
> 
>     l1[l1_offset(offset)] = l2;
> 
>     co_mutex_lock(&lock);
>     write_l1(l1);
>     co_mutex_unlock(&lock);
> 
>     commit_cluster(l2[l2_offset(offset)]);
> }
> 
> This code is incorrect.  The code before the first co_mutex_lock() may 
> be called a dozen times before before anyone finishes this routine.  
> That means the same cluster is reused multiple times.

But this is not a new problem. In qcow2 we have this today:

cluster_offset = alloc_cluster();
bdrv_aio_write()
...
update_l2()

There is already code to handle this case, even though it could probably
be simplified for coroutines. I don't think we want to make the code
worse during the conversion by basically serializing everything.

> This code was correct before we introduced coroutines because we 
> effectively had one big lock around the whole thing.

So what's the lesson to learn? You need to take care when you convert a
big lock into smaller ones?

>>> I think my previous example was wrong, you really want to do:
>>>
>>> qcow2_aio_writev() {
>>>    coroutine {
>>>        freeze();
>>>        sync_io(); // existing qcow2 code
>>>        thaw();
>>>        // existing non I/O code
>>>        bdrv_aio_writev(callback); // no explicit freeze/thaw needed
>>>    }
>>> }
>>>
>>> This is equivalent to our existing code because no new re-entrance is 
>>> introduced.  The only re-entrancy points are in the 
>>> bdrv_aio_{readv,writev} calls.
>>
>> This requires you to know which code is sync, and which code is 
>> async.  My conversion allows you to wrap the code blindly with a 
>> mutex, and have it do the right thing automatically.  This is most 
>> useful where the code can be sync or async depending on data (which is 
>> the case for qcow2).

Well, but in the case of qcow2, you don't want to have a big mutex
around everything. We perfectly know which parts are asynchronous and
which are synchronous, so we'd want to do it finer grained from the
beginning.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]