qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 0/3] aio: experimental virtio-blk polling mode


From: Fam Zheng
Subject: Re: [Qemu-devel] [RFC 0/3] aio: experimental virtio-blk polling mode
Date: Wed, 16 Nov 2016 16:27:16 +0800
User-agent: Mutt/1.7.1 (2016-10-04)

On Mon, 11/14 16:29, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 16:26, Stefan Hajnoczi wrote:
> > On Fri, Nov 11, 2016 at 01:59:25PM -0600, Karl Rister wrote:
> >> QEMU_AIO_POLL_MAX_NS      IOPs
> >>                unset    31,383
> >>                    1    46,860
> >>                    2    46,440
> >>                    4    35,246
> >>                    8    34,973
> >>                   16    46,794
> >>                   32    46,729
> >>                   64    35,520
> >>                  128    45,902
> > 
> > The environment variable is in nanoseconds.  The range of values you
> > tried are very small (all <1 usec).  It would be interesting to try
> > larger values in the ballpark of the latencies you have traced.  For
> > example 2000, 4000, 8000, 16000, and 32000 ns.
> > 
> > Very interesting that QEMU_AIO_POLL_MAX_NS=1 performs so well without
> > much CPU overhead.
> 
> That basically means "avoid a syscall if you already know there's
> something to do", so in retrospect it's not that surprising.  Still
> interesting though, and it means that the feature is useful even if you
> don't have CPU to waste.

With the "deleted" bug fixed I did a little more testing to understand this.

Setting QEMU_AIO_POLL_MAX_NS=1 doesn't mean run_poll_handlers() will only loop
for 1 ns - the patch only checks at every 1024 polls. The first poll in a
run_poll_handlers() call can hardly succeed, so we poll at least 1024 times.

According to my test, on average each run_poll_handlers() takes ~12000ns, which
is ~160 iterations of the poll loop, before geting a new event (either from
virtio queue or linux-aio, I don't have the ratio here).

So in the worse case (no new event), 1024 iterations is basically (12000 / 160 *
1024) = 76800 ns!

The above is with iodepth=1 and jobs=1.  With iodepth=32 and jobs=1, or
iodepth=8 and jobs=4, the numbers are ~30th poll with 5600ns.

Fam



reply via email to

[Prev in Thread] Current Thread [Next in Thread]