qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] New sigaltstack method for coroutine


From: Alex Barcelo
Subject: Re: [Qemu-devel] [PATCH 0/3] New sigaltstack method for coroutine
Date: Tue, 14 Feb 2012 14:12:46 +0100

On Tue, Feb 14, 2012 at 13:17, Stefan Hajnoczi <address@hidden> wrote:
> On Tue, Feb 14, 2012 at 11:38 AM, Alex Barcelo <address@hidden> wrote:
>> On Tue, Feb 14, 2012 at 09:33, Stefan Hajnoczi <address@hidden> wrote:
>>> On Mon, Feb 13, 2012 at 04:11:15PM +0100, Alex Barcelo wrote:
>>>> This new implementation... well, it seems to work (I have done an
>>>> ubuntu installation with a cdrom and a qcow drive, which seems to use
>>>> quite a lot of coroutines). Of course I have done the coroutine-test
>>>> and it was OK. But... I wasn't confident enough to propose it as a
>>>> "mature alternative". And I don't have any performance benchmark,
>>>> which would be interesting. So, I thought that the better option would
>>>> be to send this patch to the developers as an alternative to ucontext.
>>>
>>> As a starting point, I suggest looking at
>>> test-coroutine.c:perf_lifecycle().  It's a simple create-and-then-enter
>>> benchmark which measures the latency of doing this.  I expect you will
>>> find performance is identical to the ucontext version because the
>>> coroutine should be pooled and created using sigaltstack only once.
>>>
>>> The interesting thing would be to benchmark ucontext coroutine creation
>>> against sigaltstack.  Even then it may not matter much as long as pooled
>>> coroutines are used most of the time.
>>
>> Didn't see the performance mode for test-coroutine. Now a benchmark
>> test it's easy (it's half-done). The lifecycle is not a good
>> benchmark, because sigaltstack is only called once. (As you said, the
>> timing change in less than 1%).
>>
>> I thought that it would be interesting to add a performance test for
>> nesting (which can be coroutine creation intensive). So I did it. I
>> will send as a patch, is simple but it works for this.
>>
>> The preliminary results are:
>> ucontext (traditional) method:
>> MSG: Nesting 1000000 iterations of 100000 depth each: 0.452988 s
>>
>> sigaltstack (new) method:
>> MSG: Nesting 1000000 iterations of 100000 depth each: 0.689649 s
>
> Plase run the tests with more iterations.  The execution time should
> be several seconds to reduce any scheduler impact or other hickups.  I
> suggest scaling iterations up to around 10 seconds.

Ok, 10.2s vs 10.5s (still wins the traditional ucontext, but it
doesn't seem relevant any more).

>> The sigaltstack is worse (well, it doesn't surprise me, it's more
>> complicated and does more jumps and is a code flow more erratic). But
>> a loss in efficiency in coroutines should not be important (how many
>> coroutines are created in a typical qemu-system execution? I'm
>> thinking "one"). Also as you said ;) pooled coroutines are used most
>> of the time, in real qemu-system execution.
>
> No, a lot of coroutines are created - each parallel disk I/O request
> involves a coroutine.  Coroutines are also being used in other
> subsystems (e.g. virtfs).
>
> Hopefully the number active coroutines is still <100 but it's definitely >1.

I put a "Hello world, look, I'm in a coroutine" printf inside the
coroutine creation function, and I have only seen it twice in a normal
qemu-system execution. And I was doubting.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]