qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V2 0/4] *virtio-blk: add multiread support


From: Peter Lieven
Subject: Re: [Qemu-devel] [PATCH V2 0/4] *virtio-blk: add multiread support
Date: Thu, 25 Dec 2014 15:46:25 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

Am 18.12.2014 um 11:34 schrieb Kevin Wolf:
> Am 16.12.2014 um 17:00 hat Peter Lieven geschrieben:
>> On 16.12.2014 16:48, Kevin Wolf wrote:
>>> Am 16.12.2014 um 16:21 hat Peter Lieven geschrieben:
>>>> this series adds the long missing multiread support to virtio-blk.
>>>>
>>>> some remarks:
>>>>  - i introduced rd_merged and wr_merged block accounting stats to
>>>>    blockstats as a generic interface which can be set from any
>>>>    driver that will introduce multirequest merging in the future.
>>>>  - the knob to disable request merging is not yet there. I would
>>>>    add it to the device properties also as a generic interface
>>>>    to have the same switch for any driver that might introduce
>>>>    request merging in the future. As there has been no knob in
>>>>    the past I would post this as a seperate series as it needs
>>>>    some mangling in parameter parsing which might lead to further
>>>>    discussions.
>>>>  - the old multiwrite interface is still there and might be removed.
>>>>
>>>> v1->v2:
>>>>  - add overflow checking for nb_sectors [Kevin]
>>>>  - do not change the name of the macro of max mergable requests. [Fam]
>>> Diff to v1 looks good. Now I just need to check what it does to the
>>> performance. Did you run any benchmarks yourself?
>> I ran several installs of Debian/Ubuntu, Booting of Windows and Linux
>> systems. I looked at rd_total_time_ns and wr_total_time_ns and saw
>> no increase. Ofter I even saw even a decrease.
>>
>> {rd,wr}_total_time_ns measures the time from virtio_blk_handle_request
>> to virtio_blk_rw_complete. So it seems to be a good indicator for the time
>> spent with I/O.
>>
>> What you could to is post it on the top of your fio testing stack and
>> look at the throughput. Sequential Reads should be faster. The rest
>> not worse.
> So I finally ran some fio benchmark on the series. The result for small
> sequential reads (4k) is quite noisy, but it seems to be improved a bit.
> Larger sequenial reads (64k) and random reads seem to be mostly
> unaffected.
>
> For writes, however, I can see a degradation. Perhaps running multiple
> jobs in parallel means that we don't detect and merge sequential
> requests any more when they are interleaved with another sequential job.
> Or do you have an idea what else could have changed for writes?

I tried to digged a little more into this topic and maybe found whats
going on. If I a right you are using Kernel >= 3.17 in the guest for
your tests?

Here are my test results of 4k sequential writes under a Linux 3.13 guest.

Master:
virtio-fio: rd_bytes=856064 wr_bytes=34359738368 rd_operations=209 
wr_operations=8388608 flush_operations=0 wr_total_time_ns=980610962941 
rd_total_time_ns=5656675 flush_total_time_ns=0 rd_merged=0 wr_merged=6533338

Multiread_v2:
virtio-fio: rd_bytes=856064 wr_bytes=34359738368 rd_operations=209 
wr_operations=8388608 flush_operations=0 wr_total_time_ns=558830918737 
rd_total_time_ns=6159151 flush_total_time_ns=0 rd_merged=0 wr_merged=6266824

As you can see the number of merged requests is in the same order, but the 
wr_total_time_ns is heavily improved!

What happened between Linux 3.13 and 3.17 is that Ming introduced the Multiqeue 
feature into
the virtio-blk kernel code. The blk-mq developers intentionally set the number 
of hw_queues to 4
in Kernel 3.13 for virtio-blk. With the introduction of the Multiqeue feature 
in 3.17 the number of
hw_queues is set the the number of virtqueues. If multique is unsupported the 
number is 1 and
thus the number of hw_queues is also 1. So all the requests from all fio 
processes go into the same
hw_queue and this seems to break the performance. The requests are heavily 
interleaved this
way and as I only merge strictly sequential requests the old implementation 
which sorts requests
wins. But it uses twice the computation time for this.

I think this needs to be fixed in the virtio-blk kernel code and if we 
introduce multiqueue for virtio-blk into
qemu, we should set the number of virtqueues to at least 4. Maybe the number of 
cpus would also be a
good choice?!

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]