qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/3] New sigaltstack method for coroutine


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH 0/3] New sigaltstack method for coroutine
Date: Tue, 14 Feb 2012 15:11:05 +0000

On Tue, Feb 14, 2012 at 1:12 PM, Alex Barcelo <address@hidden> wrote:
> On Tue, Feb 14, 2012 at 13:17, Stefan Hajnoczi <address@hidden> wrote:
>> On Tue, Feb 14, 2012 at 11:38 AM, Alex Barcelo <address@hidden> wrote:
>>> On Tue, Feb 14, 2012 at 09:33, Stefan Hajnoczi <address@hidden> wrote:
>>>> On Mon, Feb 13, 2012 at 04:11:15PM +0100, Alex Barcelo wrote:
>>>>> This new implementation... well, it seems to work (I have done an
>>>>> ubuntu installation with a cdrom and a qcow drive, which seems to use
>>>>> quite a lot of coroutines). Of course I have done the coroutine-test
>>>>> and it was OK. But... I wasn't confident enough to propose it as a
>>>>> "mature alternative". And I don't have any performance benchmark,
>>>>> which would be interesting. So, I thought that the better option would
>>>>> be to send this patch to the developers as an alternative to ucontext.
>>>>
>>>> As a starting point, I suggest looking at
>>>> test-coroutine.c:perf_lifecycle().  It's a simple create-and-then-enter
>>>> benchmark which measures the latency of doing this.  I expect you will
>>>> find performance is identical to the ucontext version because the
>>>> coroutine should be pooled and created using sigaltstack only once.
>>>>
>>>> The interesting thing would be to benchmark ucontext coroutine creation
>>>> against sigaltstack.  Even then it may not matter much as long as pooled
>>>> coroutines are used most of the time.
>>>
>>> Didn't see the performance mode for test-coroutine. Now a benchmark
>>> test it's easy (it's half-done). The lifecycle is not a good
>>> benchmark, because sigaltstack is only called once. (As you said, the
>>> timing change in less than 1%).
>>>
>>> I thought that it would be interesting to add a performance test for
>>> nesting (which can be coroutine creation intensive). So I did it. I
>>> will send as a patch, is simple but it works for this.
>>>
>>> The preliminary results are:
>>> ucontext (traditional) method:
>>> MSG: Nesting 1000000 iterations of 100000 depth each: 0.452988 s
>>>
>>> sigaltstack (new) method:
>>> MSG: Nesting 1000000 iterations of 100000 depth each: 0.689649 s
>>
>> Plase run the tests with more iterations.  The execution time should
>> be several seconds to reduce any scheduler impact or other hickups.  I
>> suggest scaling iterations up to around 10 seconds.
>
> Ok, 10.2s vs 10.5s (still wins the traditional ucontext, but it
> doesn't seem relevant any more).
>
>>> The sigaltstack is worse (well, it doesn't surprise me, it's more
>>> complicated and does more jumps and is a code flow more erratic). But
>>> a loss in efficiency in coroutines should not be important (how many
>>> coroutines are created in a typical qemu-system execution? I'm
>>> thinking "one"). Also as you said ;) pooled coroutines are used most
>>> of the time, in real qemu-system execution.
>>
>> No, a lot of coroutines are created - each parallel disk I/O request
>> involves a coroutine.  Coroutines are also being used in other
>> subsystems (e.g. virtfs).
>>
>> Hopefully the number active coroutines is still <100 but it's definitely >1.
>
> I put a "Hello world, look, I'm in a coroutine" printf inside the
> coroutine creation function, and I have only seen it twice in a normal
> qemu-system execution. And I was doubting.

Run a couple of dd if=/dev/vda of=/dev/null iflag=direct processes
inside the guest to get some parallel I/O requests going.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]