qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 2/3] raw-posix: Convert Linux AIO submission


From: Markus Armbruster
Subject: Re: [Qemu-devel] [RFC PATCH 2/3] raw-posix: Convert Linux AIO submission to coroutines
Date: Fri, 28 Nov 2014 09:59:00 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Ming Lei <address@hidden> writes:

> On 11/28/14, Markus Armbruster <address@hidden> wrote:
>> Ming Lei <address@hidden> writes:
>>
>>> Hi Kevin,
>>>
>>> On Wed, Nov 26, 2014 at 10:46 PM, Kevin Wolf <address@hidden> wrote:
>>>> This improves the performance of requests because an ACB doesn't need to
>>>> be allocated on the heap any more. It also makes the code nicer and
>>>> smaller.
>>>
>>> I am not sure it is good way for linux aio optimization:
>>>
>>> - for raw image with some constraint, coroutine can be avoided since
>>> io_submit() won't sleep most of times
>>>
>>> - handling one time coroutine takes much time than handling malloc,
>>> memset and free on small buffer, following the test data:
>>>
>>>          --   241ns per coroutine
>>
>> What do you mean by "coroutine" here?  Create + destroy?  Yield?
>
> Please see perf_cost() in tests/test-coroutine.c

    static __attribute__((noinline)) void perf_cost_func(void *opaque)
    {
        qemu_coroutine_yield();
    }

    static void perf_cost(void)
    {
        const unsigned long maxcycles = 40000000;
        unsigned long i = 0;
        double duration;
        unsigned long ops;
        Coroutine *co;

        g_test_timer_start();
        while (i++ < maxcycles) {
            co = qemu_coroutine_create(perf_cost_func);
            qemu_coroutine_enter(co, &i);
            qemu_coroutine_enter(co, NULL);
        }
        duration = g_test_timer_elapsed();
        ops = (long)(maxcycles / (duration * 1000));

        g_test_message("Run operation %lu iterations %f s, %luK operations/s, "
                       "%luns per coroutine",
                       maxcycles,
                       duration, ops,
                       (unsigned long)(1000000000 * duration) / maxcycles);
    }

This tests create, enter, yield, reenter, terminate, destroy.  The cost
of create + destroy may well dominate.

If we create and destroy coroutines for each AIO request, we're doing it
wrong.  I doubt Kevin's doing it *that* wrong ;)

Anyway, let's benchmark the real code instead of putting undue trust in
tests/test-coroutine.c micro-benchmarks.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]