qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] block: Removed coroutine ownership assumption


From: Peter Crosthwaite
Subject: Re: [Qemu-devel] [RFC] block: Removed coroutine ownership assumption
Date: Wed, 27 Jun 2012 13:05:07 +1000

Ping!

Any Further thoughts Kevin?

This thread flew off on a tangent over whether or not coroutines
should be depracated. Seems to be the answer there was no on that
front - coroutines are here to stay.

I still think this thread points out a major flaw in block+coroutines,
regardless of the fact im using it from a machine model. This bug is
going to happen in any coroutine code that touches the block layer
(E.G. what happens if the next developer wants to implement a device
using coroutines?). Yes, without my full series there is no bug today,
but im just trying to save the next developer who decides to use
corourites (whether that be in tree or out of tree) the potentially
several hours of debugging around "why did my coroutine get yielded
randomly". That and of course minimisation of my own mainline diff.

Regards,
Peter

On Fri, Jun 22, 2012 at 8:59 PM, Peter Crosthwaite
<address@hidden> wrote:
> On Fri, Jun 22, 2012 at 6:53 PM, Kevin Wolf <address@hidden> wrote:
>> Am 22.06.2012 10:20, schrieb Peter Crosthwaite:
>>> On Fri, Jun 22, 2012 at 5:49 PM, Kevin Wolf <address@hidden> wrote:
>>>> Am 22.06.2012 08:44, schrieb Peter A. G. Crosthwaite:
>>>>> The block layer assumes that it is the only user of coroutines -
>>>>> The qemu_in_coroutine() is used to determine if a function is in one of 
>>>>> the
>>>>> block layers coroutines, which is flawed. I.E. If a client (e.g. a device 
>>>>> or
>>>>> a machine model) of the block layer uses couroutine itself, the block 
>>>>> layer
>>>>> will identify the callers coroutines as its own, and may falsely yield the
>>>>> calling coroutine (instead of creating its own to yield).
>>>>>
>>>>> AFAICT, there are no conflicts in the QEMU master here yet, but its kind 
>>>>> of an
>>>>> issue, as anyone who comes along and used coroutines and the block layer
>>>>> together is going to run into some very obscure and hard to debug race
>>>>> conditions.
>>>>>
>>>>> Signed-off-by: Peter A. G. Crosthwaite <address@hidden>
>>>>
>>>> What does your coroutine caller look like that this is a problem?
>>>
>>> Its a machine model that instantiated some block devices concurrently.
>>> Theres some chicken-and-egg issues with the instantiation such that
>>> they have the happen concurrently. One device instantiates a block
>>> device (pflash_cfi_01) from coroutine context. block then check
>>> qemu_in_coroutine() which returns true so it uses my coroutine for its
>>> inner workings, whereas if it were a normal machine model it would
>>> kick of its own coroutine to do its block stuff.
>>>
>>>  Does
>>>> it make assumptions about the number of yields or anything like that?
>>>
>>> Yes it does, but I thought that was the point of coroutines vs
>>> threads? coroutines you control yield behaviour explicitly whearas
>>> thread you cant?
>>
>> Well, I can see your point, although today no code in qemu makes use of
>> the property that the caller could in theory know where the coroutine
>> yields. I think it's also dangerous because there is a non-obvious
>> dependency of the caller on the internals of the coroutine. A simple
>> innocent looking refactoring of the coroutine might break assumptions
>> that are made here.
>>
>> Other code in qemu that uses coroutines only makes use of the fact that
>> coroutines can only be interrupted at known points,
>
> So within the block layer, will the block coroutines yield anywhere or
> only at a qemu_coroutine_yield() site? Would the block layer break if
> a couroutine could be pre-empted by the OS anywhere?
>
> So lets continue this to an example of multiple clients of
> qemu-coroutine. The block layer is one client. It defines three
> possible yields points (qemu_co_yield() sites) in block.c. Lets call
> them A,B and C. The co-routine policy is that your thread of control
> will not be pre-empted until locations  A, B or C are reached (I.E.
> coroutines can only be interrupted at known points). Alls fair and it
> works.
>
> Now here is where it breaks. I have a device or machine model or
> whatever (lets call it foo) that uses co-routines. It decides that it
> wants its coroutines to stop at only at points D,E and F for ease of
> synchronization. If those coroutines however make calls into to the
> block layer, the block layer will think its in its own co-routine and
> potentially yield at any of points A,B and C. Thing is, my foo system
> doesn't care about the block layer implementation, so it shouldnt have
> awareness of the fact it used co-routines too. But because it talks to
> block, it can get yielded as any call into the block API which is
> awkward considering the point of coroutines is they can only be
> interrupted at known points. You have essentially defeated the purpose
> or coroutines in the first place. Foo's coroutines are behaving like
> preemptible threads (while blocks are behaving like coroutines).
>
> The problem is solved with a simple piece of policy - dont yield
> someone elses co-routine. Thats this patch. Note that this is an RFC
> intended to start discussion - it needs a real implementation.
> Something along the lines of keeping track of the block layer
> coroutines and checking if in one of those specifically, rather then a
> blanket call to qemu_in_coroutine(). But I have this patch in my
> workarounds branch until we figure out a real solution.
>
> Regards,
> Peter
>
>  so synchronisation
>> between multiple coroutines becomes easier.
>>
>> Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]