qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH 11/12] qcow2: Convert qcow2 to use coroutin


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC][PATCH 11/12] qcow2: Convert qcow2 to use coroutines for async I/O
Date: Wed, 26 Jan 2011 11:12:27 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10

On 01/26/2011 10:38 AM, Avi Kivity wrote:
On 01/26/2011 06:28 PM, Anthony Liguori wrote:
On 01/26/2011 10:13 AM, Avi Kivity wrote:
Serializing against a global mutex has the advantage that it can be treated as a global lock that is decomposed into fine-grained locks.

For example, we can start the code conversion from an explict async model to a threaded sync model by converting the mutex into a shared/exclusive lock. Operations like read and write take the lock for shared access (and take a fine-grained mutex on the metadata cache entry), while operation like creating a snapshot take the lock for exclusive access. That doesn't work with freeze/thaw.

The trouble with this is that you increase the amount of re-entrance whereas freeze/thaw doesn't.

The code from the beginning of the request to where the mutex is acquired will be executed for every single request even while requests are blocked at the mutex acquisition.

It's just a few instructions.


With freeze/thaw, you freeze the queue and prevent any request from starting until you thaw. You only thaw and return control to allow another request to execute when you begin executing an asynchronous I/O callback.


What do you actually save? The longjmp() to the coroutine code, linking in to the mutex wait queue, and another switch back to the main coroutine? Given that we don't expect to block often, it seems hardly a cost worth optimizing.

It's a matter of correctness, not optimization.

Consider the following example:

coroutine {
   l2 = find_l2(offset);
   // allocates but does not increment max cluster offset
   l2[l2_offset(offset)] = alloc_cluster();

   co_mutex_lock(&lock);
   write_l2(l2);
   co_mutex_unlock(&lock);

   l1[l1_offset(offset)] = l2;

   co_mutex_lock(&lock);
   write_l1(l1);
   co_mutex_unlock(&lock);

   commit_cluster(l2[l2_offset(offset)]);
}

This code is incorrect. The code before the first co_mutex_lock() may be called a dozen times before before anyone finishes this routine. That means the same cluster is reused multiple times.

This code was correct before we introduced coroutines because we effectively had one big lock around the whole thing.

Regards,

Anthony Liguori


I think my previous example was wrong, you really want to do:

qcow2_aio_writev() {
   coroutine {
       freeze();
       sync_io(); // existing qcow2 code
       thaw();
       // existing non I/O code
       bdrv_aio_writev(callback); // no explicit freeze/thaw needed
   }
}

This is equivalent to our existing code because no new re-entrance is introduced. The only re-entrancy points are in the bdrv_aio_{readv,writev} calls.

This requires you to know which code is sync, and which code is async. My conversion allows you to wrap the code blindly with a mutex, and have it do the right thing automatically. This is most useful where the code can be sync or async depending on data (which is the case for qcow2).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]