[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virt
From: |
Ming Lei |
Subject: |
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support |
Date: |
Fri, 15 Aug 2014 18:39:17 +0800 |
On Thu, Aug 14, 2014 at 6:46 PM, Kevin Wolf <address@hidden> wrote:
> Am 11.08.2014 um 21:37 hat Paolo Bonzini geschrieben:
>> Il 10/08/2014 05:46, Ming Lei ha scritto:
>> > Hi Kevin, Paolo, Stefan and all,
>> >
>> >
>> > On Wed, 6 Aug 2014 10:48:55 +0200
>> > Kevin Wolf <address@hidden> wrote:
>> >
>> >> Am 06.08.2014 um 07:33 hat Ming Lei geschrieben:
>> >
>> >>
>> >> Anyhow, the coroutine version of your benchmark is buggy, it leaks all
>> >> coroutines instead of exiting them, so it can't make any use of the
>> >> coroutine pool. On my laptop, I get this (where fixed coroutine is a
>> >> version that simply removes the yield at the end):
>> >>
>> >> | bypass | fixed coro | buggy coro
>> >> ----------------+---------------+---------------+--------------
>> >> time | 1.09s | 1.10s | 1.62s
>> >> L1-dcache-loads | 921,836,360 | 932,781,747 | 1,298,067,438
>> >> insns per cycle | 2.39 | 2.39 | 1.90
>> >>
>> >> Begs the question whether you see a similar effect on a real qemu and
>> >> the coroutine pool is still not big enough? With correct use of
>> >> coroutines, the difference seems to be barely measurable even without
>> >> any I/O involved.
>> >
>> > Now I fixes the coroutine leak bug, and previous crypt bench is a bit high
>> > loading, and cause operations per sec very low(~40K/sec), finally I write
>> > a new
>> > and simple one which can generate hundreds of kilo operations per sec and
>> > the number should match with some fast storage devices, and it does show
>> > there
>> > is not small effect from coroutine.
>> >
>> > Extremely if just getppid() syscall is run in each iteration, with using
>> > coroutine,
>> > only 3M operations/sec can be got, and without using coroutine, the number
>> > can
>> > reach 16M/sec, and there is more than 4 times difference!!!
>>
>> I should be on vacation, but I'm following a couple threads in the mailing
>> list
>> and I'm a bit tired to hear the same argument again and again...
>>
>> The different characteristics of asynchronous I/O vs. any synchronous
>> workload
>> are such that it is hard to be sure that microbenchmarks make sense.
>>
>> The below patch is basically the minimal change to bypass coroutines. Of
>> course
>> the block.c part is not acceptable as is (the change to refresh_total_sectors
>> is broken, the others are just ugly), but it is a start. Please run it with
>> your fio workloads, or write an aio-based version of a qemu-img/qemu-io *I/O*
>> benchmark.
>
> So to finally reply with some numbers... I'm running fio tests based on
> Ming's configuration on a loop-mounted tmpfs image using dataplane. I've
> extended the tests to not only test random reads, but also sequential
> reads. I did not yet test writes and almost no test for block sizes
> larger than 4k, so I'm not including it here.
>
> The "base" case is with Ming's patches applied, but the set_bypass(true)
> call commented out in the virtio-blk code. All other cases are patches
> applied on top of this.
>
> | Random throughput | Sequential throughput
> ----------------+-------------------+-----------------------
> master | 442 MB/s | 730 MB/s
> base | 453 MB/s | 757 MB/s
> bypass (Ming) | 461 MB/s | 734 MB/s
> coroutine | 468 MB/s | 716 MB/s
> bypass (Paolo) | 476 MB/s | 682 MB/s
Looks the difference between rand read and sequential read
is quite big, which shouldn't have been so since the whole file is
cached in ram.
>
> So while your patches look pretty good in Ming's test case of random
> reads, I think the sequential case is worrying. The same is true for my
> latest coroutine optimisations, even though the degradation is smaller
> there.
In my VM test, both rand read and sequential read result are basically
same, and IO thread's CPU utilization is more than 93% with Paolo's
patch, over both nullblk and loop on file in tmpfs.
I am using 3.16 kernel.
>
> This needs some more investigation.
Maybe it is caused by your test setup and environment, or your VM kernel,
not sure.
Thanks,
--
Ming Lei
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, (continued)
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/13
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Paolo Bonzini, 2014/08/13
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Stefan Hajnoczi, 2014/08/13
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/13
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Paolo Bonzini, 2014/08/13
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/13
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Kevin Wolf, 2014/08/14
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support,
Ming Lei <=
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Paolo Bonzini, 2014/08/15
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/16
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Paolo Bonzini, 2014/08/17
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Kevin Wolf, 2014/08/18
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Stefan Hajnoczi, 2014/08/06