qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread f


From: Peter Lieven
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool
Date: Fri, 28 Nov 2014 13:26:06 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

Am 28.11.2014 um 13:21 schrieb Paolo Bonzini:
>
> On 28/11/2014 12:32, Peter Lieven wrote:
>> Am 28.11.2014 um 12:23 schrieb Paolo Bonzini:
>>> On 28/11/2014 12:21, Peter Lieven wrote:
>>>> Am 28.11.2014 um 12:14 schrieb Paolo Bonzini:
>>>>>> master:
>>>>>> Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns 
>>>>>> per coroutine
>>>>>>
>>>>>> paolo:
>>>>>> Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns 
>>>>>> per coroutine
>>>>> Nice. :)
>>>>>
>>>>> Can you please try "coroutine: Use __thread … " together, too?  I still
>>>>> see 11% time spent in pthread_getspecific, and I get ~10% more indeed if
>>>>> I apply it here (my times are 191/160/145).
>>>> indeed:
>>>>
>>>> Run operation 40000000 iterations 10.138684 s, 3945K operations/s, 253ns 
>>>> per coroutine
>>> Your perf_master2 uses the ring buffer unconditionally, right?  I wonder
>>> if we can use a similar algorithm but with arrays instead of lists...
>> Why do you set pool_size = 0 in the create path?
>>
>> When I do the following:
>> diff --git a/qemu-coroutine.c b/qemu-coroutine.c
>> index 6bee354..c79ee78 100644
>> --- a/qemu-coroutine.c
>> +++ b/qemu-coroutine.c
>> @@ -44,7 +44,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
>>                   * and the actual size of alloc_pool.  But it is just a 
>> heuristic,
>>                   * it does not need to be perfect.
>>                   */
>> -                pool_size = 0;
>> +                atomic_dec(&pool_size);
>>                  QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
>>                  co = QSLIST_FIRST(&alloc_pool);
>>
>>
>> I get:
>> Run operation 40000000 iterations 9.883958 s, 4046K operations/s, 247ns per 
>> coroutine
> Because pool_size is the (approximate) number of coroutines in the pool.
>  It is zero after QSLIST_MOVE_ATOMIC has NULL-ed out release_pool.slh_first.

got it meanwhile. and its not as bad as i thought since you only steal the 
release_pool if your
alloc_pool is empty. Right?

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]