qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread f


From: Peter Lieven
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool
Date: Fri, 28 Nov 2014 11:37:27 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

Am 28.11.2014 um 11:28 schrieb Paolo Bonzini:
>
> On 28/11/2014 09:13, Peter Lieven wrote:
>> Am 27.11.2014 um 17:40 schrieb Paolo Bonzini:
>>> On 27/11/2014 11:27, Peter Lieven wrote:
>>>> +static __thread struct CoRoutinePool {
>>>> +    Coroutine *ptrs[POOL_MAX_SIZE];
>>>> +    unsigned int size;
>>>> +    unsigned int nextfree;
>>>> +} CoPool;
>>>>  
>>> The per-thread ring unfortunately didn't work well last time it was
>>> tested.  Devices that do not use ioeventfd (not just the slow ones, even
>>> decently performing ones like ahci, nvme or megasas) will create the
>>> coroutine in the VCPU thread, and destroy it in the iothread.  The
>>> result is that coroutines cannot be reused.
>>>
>>> Can you check if this is still the case?
>> I already tested at least for IDE and for ioeventfd=off. The coroutine
>> is created in the vCPU thread and destroyed in the I/O thread.
>>
>> I also havea more complicated version which sets per therad coroutine pool 
>> only
>> for dataplane. Avoiding the lock for dedicated iothreads.
>>
>> For those who want to take a look:
>>
>> https://github.com/plieven/qemu/commit/325bc4ef5c7039337fa785744b145e2bdbb7b62e
> Can you test it against the patch I just sent in Kevin's linux-aio
> coroutine thread?

Was already doing it ;-) At least with test-couroutine.c....

master:
Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns per 
coroutine

paolo:
Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns per 
coroutine

plieven/perf_master2:
Run operation 40000000 iterations 9.013785 s, 4437K operations/s, 225ns per 
coroutine

plieven/perf_master:
Run operation 40000000 iterations 11.072883 s, 3612K operations/s, 276ns per 
coroutine

However, perf_master and perf_master2 have a regerssion regarding nesting as it 
seems.
@Kevin: Could that be the reason why they performe bad in some szenarios?


Regarding the bypass that is discussed. If it is not just a benchmark thing but 
really necessary
for some peoples use cases why not add a new aio mode like "bypass" and use it 
only then.
If the performance is really needed the user he/she might trade it in for lost 
features like iothrottling, filters etc.

Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]