qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] virtio-blk: simple multithreaded MQ implementatio


From: Alexandre DERUMIER
Subject: Re: [Qemu-devel] [RFC] virtio-blk: simple multithreaded MQ implementation for bdrv_raw
Date: Mon, 30 May 2016 08:40:39 +0200 (CEST)

Hi,

>>To avoid any locks in qemu backend and not to introduce thread safety
>>into qemu block-layer I open same backend device several times, one
>>device per one MQ.  e.g. the following is the stack for a virtio-blk
>>with num-queues=2:

Could it be possible in the future to not open several times the same backend ?
I'm thinking about ceph/librbd, which since last version allow only to open 
once a backend by default
(exclusive-lock, which is a requirement for advanced features like 
rbd-mirroring, fast-diff,....)

Regards,

Alexandre Derumier


----- Mail original -----
De: "Stefan Hajnoczi" <address@hidden>
À: "Roman Pen" <address@hidden>
Cc: "qemu-devel" <address@hidden>, "stefanha" <address@hidden>
Envoyé: Samedi 28 Mai 2016 00:27:10
Objet: Re: [Qemu-devel] [RFC] virtio-blk: simple multithreaded MQ 
implementation for bdrv_raw

On Fri, May 27, 2016 at 01:55:04PM +0200, Roman Pen wrote: 
> Hello, all. 
> 
> This is RFC because mostly this patch is a quick attempt to get true 
> multithreaded multiqueue support for a block device with native AIO. 
> The goal is to squeeze everything possible on lockless IO path from 
> MQ block on a guest to MQ block on a host. 
> 
> To avoid any locks in qemu backend and not to introduce thread safety 
> into qemu block-layer I open same backend device several times, one 
> device per one MQ. e.g. the following is the stack for a virtio-blk 
> with num-queues=2: 
> 
> VirtIOBlock 
> / \ 
> VirtQueue#0 VirtQueue#1 
> IOThread#0 IOThread#1 
> BH#0 BH#1 
> Backend#0 Backend#1 
> \ / 
> /dev/null0 
> 
> To group all objects related to one vq new structure is introduced: 
> 
> typedef struct VirtQueueCtx { 
> BlockBackend *blk; 
> struct VirtIOBlock *s; 
> VirtQueue *vq; 
> void *rq; 
> QEMUBH *bh; 
> QEMUBH *batch_notify_bh; 
> IOThread *iothread; 
> Notifier insert_notifier; 
> Notifier remove_notifier; 
> /* Operation blocker on BDS */ 
> Error *blocker; 
> } VirtQueueCtx; 
> 
> And VirtIOBlock includes an array of these contexts: 
> 
> typedef struct VirtIOBlock { 
> VirtIODevice parent_obj; 
> + VirtQueueCtx mq[VIRTIO_QUEUE_MAX]; 
> ... 
> 
> This patch is based on Stefan's series: "virtio-blk: multiqueue support", 
> with minor difference: I reverted "virtio-blk: multiqueue batch notify", 
> which does not make a lot sense when each VQ is handled by it's own 
> iothread. 
> 
> The qemu configuration stays the same, i.e. put num-queues=N and N 
> iothreads will be started on demand and N drives will be opened: 
> 
> qemu -device virtio-blk-pci,num-queues=8 
> 
> My configuration is the following: 
> 
> host: 
> Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 
> 8 CPUs, 
> /dev/nullb0 as backend with the following parameters: 
> $ cat /sys/module/null_blk/parameters/submit_queues 
> 8 
> $ cat /sys/module/null_blk/parameters/irqmode 
> 1 
> 
> guest: 
> 8 VCPUs 
> 
> qemu: 
> -object iothread,id=t0 \ 
> -drive 
> if=none,id=d0,file=/dev/nullb0,format=raw,snapshot=off,cache=none,aio=native 
> \ 
> -device 
> virtio-blk-pci,num-queues=$N,iothread=t0,drive=d0,disable-modern=off,disable-legacy=on
>  
> 
> where $N varies during the tests. 
> 
> fio: 
> [global] 
> description=Emulation of Storage Server Access Pattern 
> bssplit=512/20:1k/16:2k/9:4k/12:8k/19:16k/10:32k/8:64k/4 
> fadvise_hint=0 
> rw=randrw:2 
> direct=1 
> 
> ioengine=libaio 
> iodepth=64 
> iodepth_batch_submit=64 
> iodepth_batch_complete=64 
> numjobs=8 
> gtod_reduce=1 
> group_reporting=1 
> 
> time_based=1 
> runtime=30 
> 
> [job] 
> filename=/dev/vda 
> 
> Results: 
> num-queues RD bw WR bw 
> ---------- ----- ----- 
> 
> * with 1 iothread * 
> 
> 1 thr 1 mq 1225MB/s 1221MB/s 
> 1 thr 2 mq 1559MB/s 1553MB/s 
> 1 thr 4 mq 1729MB/s 1725MB/s 
> 1 thr 8 mq 1660MB/s 1655MB/s 
> 
> * with N iothreads * 
> 
> 2 thr 2 mq 1845MB/s 1842MB/s 
> 4 thr 4 mq 2187MB/s 2183MB/s 
> 8 thr 8 mq 1383MB/s 1378MB/s 
> 
> Obviously, 8 iothreads + 8 vcpu threads is too much for my machine 
> with 8 CPUs, but 4 iothreads show quite good result. 

Cool, thanks for trying this experiment and posting results. 

It's encouraging to see the improvement. Did you use any CPU affinity 
settings to co-locate vcpu and iothreads onto host CPUs? 

Stefan 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]