qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than


From: Roman Penyaev
Subject: Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than MAX_EVENTS
Date: Fri, 15 Jul 2016 17:03:28 +0200

On Fri, Jul 15, 2016 at 11:18 AM, Roman Penyaev
<address@hidden> wrote:
> On Wed, Jul 13, 2016 at 1:45 PM, Kevin Wolf <address@hidden> wrote:
>> Am 13.07.2016 um 13:33 hat Roman Penyaev geschrieben:
>>> Just to be sure that we are on the same page:
>>>
>>> 1. We have this commit "linux-aio: Cancel BH if not needed" which
>>>
>>>    a) introduces performance regression on my fio workloads on the
>>>       following config: "iothread=1, VCPU=8, MQ=8". Performance
>>>       dropped from 1878MB/s to 1606MB/s with Stefan's fix, that is
>>>       ~14%.
>>
>> Do we already understand why the performance regresses with the patch?
>> As long as we don't, everything we do is just guesswork.
>
> Eventually the issue is clear.  I test on /dev/nullb0, which completes
> all submitted bios almost immediately.  That means, that after io_submit()
> is called it is worth trying to check completed requests and not to
> accumulate them in-flight.
>

[snip]

>
> The theoretical fix would be to schedule completion BH just after
> successful io_submit, i.e.:
>
> ---------------------------------------------------------------------
> @@ -228,6 +228,8 @@ static void ioq_submit(LinuxAioState *s)
>          QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
>      } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
>      s->io_q.blocked = (s->io_q.n > 0);
> +
> +    qemu_bh_schedule(s->completion_bh);
>  }
> ---------------------------------------------------------------------
>
> This theoretical fix works pretty fine and numbers return to expected
> ~1800MB/s.
>
> So believe me or not but BH, which was not accidentally canceled, gives
> better results on very fast backend devices.
>
> The other interesting observation is the following: submission limitation
> (which I did in the "linux-aio: prevent submitting more than MAX_EVENTS"
> patch) also fixes the issue, because before submitting more than MAX_EVENTS
> we have to reap something, which obviously do not let already completed
> requests stall in the queue for a long time.

Got expected but nevertheless interesting latencies from fio:

---------------------------------------------------------------------------
   master
   + "linux-aio: keep processing events if MAX_EVENTS reached"

read : io=47995MB, bw=1599.8MB/s, iops=157530, runt= 30002msec
    clat (usec): min=1, max=19754, avg=1223.26, stdev=358.03
    clat percentiles (usec):
     | 30.00th=[ 1080], 40.00th=[ 1160], 50.00th=[ 1224], 60.00th=[ 1288],
    lat (usec) : 750=6.55%, 1000=14.19%, 2000=75.38%


---------------------------------------------------------------------------
   master
   + "linux-aio: prevent submitting more than MAX_EVENTS"

read : io=53746MB, bw=1791.4MB/s, iops=176670, runt= 30003msec
    clat (usec): min=1, max=15902, avg=1067.67, stdev=352.40
    clat percentiles (usec):
     | 30.00th=[  932], 40.00th=[ 1004], 50.00th=[ 1064], 60.00th=[ 1128],
    lat (usec) : 750=10.68%, 1000=25.06%, 2000=59.62%


---------------------------------------------------------------------------
   master
   + "linux-aio: prevent submitting more than MAX_EVENTS"
   + schedule completion BH just after each successful io_submit()

read : io=56875MB, bw=1895.8MB/s, iops=186986, runt= 30001msec
    clat (usec): min=2, max=17288, avg=991.57, stdev=318.86
    clat percentiles (usec):
     | 30.00th=[  868], 40.00th=[  940], 50.00th=[ 1004], 60.00th=[ 1064],
    lat (usec) : 750=13.85%, 1000=30.57%, 2000=49.84%


Three examples definitely show (even without charts) that more often we peek
and harvest completed requests - more performance gain we can get.

Still a lot of room for optimization :)


--
Roman



reply via email to

[Prev in Thread] Current Thread [Next in Thread]