Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-

From:	Anthony Liguori
Subject:	Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-aio
Date:	Fri, 26 Sep 2008 13:35:19 -0500
User-agent:	Thunderbird 2.0.0.16 (X11/20080723)

Ryan Harper wrote:

* Anthony Liguori <address@hidden> [2008-09-26 11:03]:

Revision: 5323
          http://svn.sv.gnu.org/viewvc/?view=rev&root=qemu&revision=5323
Author:   aliguori
Date:     2008-09-26 15:59:29 +0000 (Fri, 26 Sep 2008)

Log Message:
-----------
Implement an fd pool to get real AIO with posix-aio

This patch implements a simple fd pool to allow many AIO requests with
posix-aio.  The result is significantly improved performance (identical to that
reported for linux-aio) for both cache=on and cache=off.

The fundamental problem with posix-aio is that it limits itself to one thread
per-file descriptor.  I don't know why this is, but this patch provides a simple
mechanism to work around this (duplicating the file descriptor).

This isn't a great solution, but it seems like a reasonable intermediate step
between posix-aio and a custom thread-pool to replace it.

Ryan Harper will be posting some performance analysis he did comparing posix-aio
with fd pooling against linux-aio.  The size of the posix-aio thread pool and
the fd pool were largely determined by him based on this analysis.


I'll have some more data to post in a bit, but for now, bumping the fd
pool up to 64 and ensuring we init aio to support a thread per fd, we
mostly match linux aio performance with a simpler implementation.  For
randomwrites, fd_pool lags a bit, but I've got other data that shows in
most scenarios, fd_pool matches linux aio performance and does so with
less CPU consumption.

Results:

16k randwrite 1 thread, 74 iodepth | MB/s | avg sub lat (us) | avg comp lat (ms)
-----------------------------------+------+------------------+------------------
baremetal (O_DIRECT, aka cache=off)| 61.2 |   13.07          |  19.59
kvm: cache=off posix-aio w/o patch |  4.7 | 3467.44          | 254.08

So with posix-aio, once we have many requests, each request is going toblock until the request completes. I don't fully understand why theaverage completion latency is so high because in theory, there should beno delay between completion and submission. Maybe it has to do with thefact that we spend so much time blocking during submission, that theio-thread doesn't get a chance to run. I bet if we dropped theqemu_mutex during submission, the completion latency would drop to avery small number. Not worth actually testing.

kvm: cache=off linux-aio           | 61.1 |   75.35          |  19.57

The fact that the submission latency is so high confirms what I've beenabout linux-aio submissions being very unoptimal. That is really quitehigh.

kvm: cache=on  posix-aio w/o patch |127.0 |  115.78          |   9.19
kvm: cache=on  posix-aio w/ patch  |126.0 |   67.35          |   9.30

It looks like 127mb/s is pretty close to the optimal cached write time.When using caching, writes can complete almost immediately so it's notsurprising that submission latency is so low (even though it's blockingduring submission).

I am surprised that w/patch has a latency that's so high. I think thatsuggests that requests are queuing up. I bet increasing the aio_numfield would reduce this number.

------------ new results ----------+------+------------------+------------------
kvm:cache=off posix-aio fd_pool[16]| 33.5 |   14.28          |  49.19
kvm:cache=off posix-aio fd_pool[64]| 51.1 |   14.86          |  23.66

I assume you tried to bump from 64 to something higher and couldn't makeup the lost bandwidth?

16k write 1 thread, 74 iodepth     | MB/s | avg sub lat (us) | avg comp lat (ms)
-----------------------------------+------+------------------+------------------
baremetal (O_DIRECT, aka cache=off)|128.1 |   10.90          |   9.45

kvm: cache=off posix-aio w/o patch | 5.1 | 3152.00 | 231.06kvm: cache=off linux-aio |130.0 | 83.83 | 8.99

kvm: cache=on  posix-aio w/o patch |184.0 |   80.46          |   6.35
kvm: cache=on  posix-aio w/ patch  |165.0 |   70.90          |   7.09
------------ new results ----------+------+------------------+------------------
kvm:cache=off posix-aio fd_pool[16]| 78.2 |   58.24          |  15.43
kvm:cache=off posix-aio fd_pool[64]|129.0 |   71.62          |   9.11

That's a nice result. We could probably improve the latency by tweakingthe queue sizes.


Very nice work!  Thanks for doing the thorough analysis.

Regards,

Anthony Liguori

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-aio, Anthony Liguori, 2008/09/26
- Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-aio, Ryan Harper, 2008/09/26
  - Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-aio, Anthony Liguori <=
    - Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-aio, Ryan Harper, 2008/09/26

Prev by Date: [Qemu-devel] linux-user mremap()
Next by Date: Re: [Qemu-devel] Re: Makefile question
Previous by thread: Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-aio
Next by thread: Re: [Qemu-devel] [5323] Implement an fd pool to get real AIO with posix-aio
Index(es):
- Date
- Thread