qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 2/3] raw-posix: Convert Linux AIO submission


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC PATCH 2/3] raw-posix: Convert Linux AIO submission to coroutines
Date: Fri, 28 Nov 2014 11:06:25 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 28.11.2014 um 03:59 hat Ming Lei geschrieben:
> Hi Kevin,
> 
> On Wed, Nov 26, 2014 at 10:46 PM, Kevin Wolf <address@hidden> wrote:
> > This improves the performance of requests because an ACB doesn't need to
> > be allocated on the heap any more. It also makes the code nicer and
> > smaller.
> 
> I am not sure it is good way for linux aio optimization:
> 
> - for raw image with some constraint, coroutine can be avoided since
> io_submit() won't sleep most of times
> 
> - handling one time coroutine takes much time than handling malloc,
> memset and free on small buffer, following the test data:
> 
>          --   241ns per coroutine
>          --   61ns per (malloc, memset, free for 128bytes)

Please finally stop making comparisons between completely unrelated
things and trying to make a case against coroutines out of it. It simply
doesn't make any sense.

The truth is that in the 'qemu-img bench' case as well as in the highest
performing VM setup for Peter and me, the practically existing coroutine
based git branches perform better then the practically existing bypass
branches. If you think that theoretically the bypass branches must be
better, show us the patches and benchmarks.

If you can't, let's merge the coroutine improvements (which improve
more than just the case of raw images using no block layer features,
including cases that benefit the average user) and be done.

> I still think we should figure out a fast path to avoid cocourinte
> for linux-aio with raw image, otherwise it can't scale well for high
> IOPS device.
> 
> Also we can use simple buf pool to avoid the dynamic allocation
> easily, can't we?

Yes, the change to g_slice_alloc() was a bad move performance-wise.

> > As a side effect, the codepath taken by aio=threads is changed to use
> > paio_submit_co(). This doesn't change the performance at this point.
> >
> > Results of qemu-img bench -t none -c 10000000 [-n] /dev/loop0:
> >
> >       |      aio=native       |     aio=threads
> >       | before   | with patch | before   | with patch
> > ------+----------+------------+----------+------------
> > run 1 | 29.921s  | 26.932s    | 35.286s  | 35.447s
> > run 2 | 29.793s  | 26.252s    | 35.276s  | 35.111s
> > run 3 | 30.186s  | 27.114s    | 35.042s  | 34.921s
> > run 4 | 30.425s  | 26.600s    | 35.169s  | 34.968s
> > run 5 | 30.041s  | 26.263s    | 35.224s  | 35.000s
> >
> > TODO: Do some more serious benchmarking in VMs with less variance.
> > Results of a quick fio run are vaguely positive.
> 
> I will do the test with Paolo's fast path approach under
> VM I/O situation.

Currently, the best thing to compare it against is probably Peter's git
branch at https://github.com/plieven/qemu.git perf_master2. This patch
is only a first step in a whole series of possible optimisations.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]