qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 2/3] raw-posix: Convert Linux AIO submission


From: Ming Lei
Subject: Re: [Qemu-devel] [RFC PATCH 2/3] raw-posix: Convert Linux AIO submission to coroutines
Date: Fri, 28 Nov 2014 17:15:40 +0800

On 11/28/14, Markus Armbruster <address@hidden> wrote:
> Ming Lei <address@hidden> writes:
>
>> On 11/28/14, Markus Armbruster <address@hidden> wrote:
>>> Ming Lei <address@hidden> writes:
>>>
>>>> Hi Kevin,
>>>>
>>>> On Wed, Nov 26, 2014 at 10:46 PM, Kevin Wolf <address@hidden> wrote:
>>>>> This improves the performance of requests because an ACB doesn't need
>>>>> to
>>>>> be allocated on the heap any more. It also makes the code nicer and
>>>>> smaller.
>>>>
>>>> I am not sure it is good way for linux aio optimization:
>>>>
>>>> - for raw image with some constraint, coroutine can be avoided since
>>>> io_submit() won't sleep most of times
>>>>
>>>> - handling one time coroutine takes much time than handling malloc,
>>>> memset and free on small buffer, following the test data:
>>>>
>>>>          --   241ns per coroutine
>>>
>>> What do you mean by "coroutine" here?  Create + destroy?  Yield?
>>
>> Please see perf_cost() in tests/test-coroutine.c
>
>     static __attribute__((noinline)) void perf_cost_func(void *opaque)
>     {
>         qemu_coroutine_yield();
>     }
>
>     static void perf_cost(void)
>     {
>         const unsigned long maxcycles = 40000000;
>         unsigned long i = 0;
>         double duration;
>         unsigned long ops;
>         Coroutine *co;
>
>         g_test_timer_start();
>         while (i++ < maxcycles) {
>             co = qemu_coroutine_create(perf_cost_func);
>             qemu_coroutine_enter(co, &i);
>             qemu_coroutine_enter(co, NULL);
>         }
>         duration = g_test_timer_elapsed();
>         ops = (long)(maxcycles / (duration * 1000));
>
>         g_test_message("Run operation %lu iterations %f s, %luK
> operations/s, "
>                        "%luns per coroutine",
>                        maxcycles,
>                        duration, ops,
>                        (unsigned long)(1000000000 * duration) / maxcycles);
>     }
>
> This tests create, enter, yield, reenter, terminate, destroy.  The cost
> of create + destroy may well dominate.

Actually there shouldn't have been much cost from create and destroy
attributed to coroutine pool.

>
> If we create and destroy coroutines for each AIO request, we're doing it
> wrong.  I doubt Kevin's doing it *that* wrong ;)
>
> Anyway, let's benchmark the real code instead of putting undue trust in
> tests/test-coroutine.c micro-benchmarks.

I don't think there isn't trust from the micro-benchmark.

That is the direct cost from coroutine, and the cost won't be avoided at all,
not mention cost from switching stack.

If you google some test data posted by me previously, that would show
bypassing coroutine can increase throughput with ~50% for raw image
in case of linux aio, that is the real test case, not micro-benchmark.


Thanks,
Ming Lei



reply via email to

[Prev in Thread] Current Thread [Next in Thread]