qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC patch 0/1] block: vhost-blk backend


From: Andrey Zhadchenko
Subject: Re: [RFC patch 0/1] block: vhost-blk backend
Date: Wed, 5 Oct 2022 13:28:14 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.1.0



On 10/4/22 21:26, Stefan Hajnoczi wrote:
On Mon, Jul 25, 2022 at 11:55:26PM +0300, Andrey Zhadchenko wrote:
Although QEMU virtio-blk is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk requests
in host kernel so we avoid a lot of syscalls and context switches.

The biggest disadvantage of this vhost-blk flavor is raw format.
Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach
files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html

Also by using kernel modules we can bypass iothread limitation and finaly scale
block requests with cpus for high-performance devices. This is planned to be
implemented in next version.

Linux kernel module part:
https://lore.kernel.org/kvm/20220725202753.298725-1-andrey.zhadchenko@virtuozzo.com/

test setups and results:
fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128

QEMU drive options: cache=none
filesystem: xfs

Please post the full QEMU command-line so it's clear exactly what this
is benchmarking.

The full command for vhost is this:
qemu-system-x86_64 \
-kernel bzImage -nographic -append "console=ttyS0 root=/dev/sdb rw systemd.unified_cgroup_hierarchy=0 nokaslr" \
-m 1024 -s --enable-kvm -smp $2 \
-drive id=main_drive,file=debian_sid.img,media=disk,format=raw \
-drive id=vhost_drive,file=$1,media=disk,format=raw,if=none \
-device vhost-blk-pci,drive=vhost_drive,num-threads=$3

(num-threads option for vhost-blk-pci was not used)

For virtio I used this:
qemu-system-x86_64 \
-kernel bzImage -nographic -append "console=ttyS0 root=/dev/sdb rw systemd.unified_cgroup_hierarchy=0 nokaslr" \
-m 1024 -s --enable-kvm -smp $2 \
-drive file=debian_sid.img,media=disk \
-drive file=$1,media=disk,if=virtio,cache=none,if=none,id=d1,aio=threads\
-device virtio-blk-pci,drive=d1


A preallocated raw image file is a good baseline with:

   --object iothread,id=iothread0 \
   --blockdev file,filename=test.img,cache.direct=on,aio=native,node-name=drive0 
>    --device virtio-blk-pci,drive=drive0,iothread=iothread0
The image I used was preallocated qcow2 image set up with dm-qcow2 because this vhost-blk version directly uses bio interface and can't work with regular files.


(BTW QEMU's default vq size is 256 descriptors and the number of vqs is
the number of vCPUs.)


SSD:
                | randread, IOPS  | randwrite, IOPS |
Host           |      95.8k      |      85.3k      |
QEMU virtio    |      57.5k      |      79.4k      |

Adding iothread0 and using raw file instead of qcow2 + dm-qcow2 setup brings the numbers to
                  |      60.4k   |      84.3k      |

QEMU vhost-blk |      95.6k      |      84.3k      |

RAMDISK (vq == vcpu):

With fio numjobs=vcpu here?

Yes


                  | randread, IOPS | randwrite, IOPS |
virtio, 1vcpu    |      123k      |      129k       |
virtio, 2vcpu    |      253k (??) |      250k (??)  |

QEMU's aio=threads (default) gets around the single IOThread. It beats
aio=native for this reason in some cases. Were you using aio=native or
aio=threads?

At some point of time I started to specify aio=threads (and before that I did not use this option). I am not sure when exactly. I will re-measure all cases for the next submission.


virtio, 4vcpu    |      158k      |      154k       |
vhost-blk, 1vcpu |      110k      |      113k       |
vhost-blk, 2vcpu |      247k      |      252k       |



reply via email to

[Prev in Thread] Current Thread [Next in Thread]