qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread f


From: Peter Lieven
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool
Date: Fri, 28 Nov 2014 13:49:53 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

Am 28.11.2014 um 13:45 schrieb Paolo Bonzini:
>
> On 28/11/2014 13:39, Peter Lieven wrote:
>> Am 28.11.2014 um 13:26 schrieb Paolo Bonzini:
>>> On 28/11/2014 12:46, Peter Lieven wrote:
>>>>> I get:
>>>>> Run operation 40000000 iterations 9.883958 s, 4046K operations/s, 247ns 
>>>>> per coroutine
>>>> Ok, understood, it "steals" the whole pool, right? Isn't that bad if we 
>>>> have more
>>>> than one thread in need of a lot of coroutines?
>>> Overall the algorithm is expected to adapt.  The N threads contribute to
>>> the global release pool, so the pool will fill up N times faster than if
>>> you had only one thread.  There can be some variance, which is why the
>>> maximum size of the pool is twice the threshold (and probably could be
>>> tuned better).
>>>
>>> Benchmarks are needed on real I/O too, of course, especially with high
>>> queue depth.
>> Yes, cool. The atomic operations are a bit tricky at the first glance ;-)
>>
>> Question:
>>  Why is the pool_size increment atomic and the set to zero not?
> Because the set to zero is not a read-modify-write operation, so it is
> always atomic.  It's just not sequentially-consistent (see
> docs/atomics.txt for some info on what that means).
>
>> Idea:
>>  If the release_pool is full why not put the coroutine in the thread 
>> alloc_pool instead of throwing it away? :-)
> Because you can only waste 64 coroutines per thread.  But numbers cannot
> be sneezed at, so it's worth doing it as a separate patch.

What do you mean by that? If I use dataplane I will fill the global pool and 
never use it okay, but
then I use thread local storage only. So I get the same numbers as in my thread 
local storage only version.

Maybe it is an idea to tweak the POOL_BATCH_SIZE * 2 according to what is 
really attached. If we
have only dataplane or ioeventfd it can be POOL_BATCH_SIZE * 0 and we even 
won't waste those
coroutines oxidating in the global pool.

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]