qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
Date: Tue, 16 Apr 2019 10:20:42 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
> 
> 
> On 4/16/19 1:34 AM, Wei Li wrote:
> > Hi @Paolo Bonzini & @Stefan Hajnoczi,
> > 
> > Would you please help confirm whether @Paolo Bonzini's multiqueue feature 
> > change will benefit virtio-scsi or not? Thanks!
> > 
> > @Stefan Hajnoczi,
> > I also spent some time on exploring the virtio-scsi multi-queue features 
> > via num_queues parameter as below, here are what we found:
> > 
> > 1. Increase number of Queues from one to the same number as CPU will get 
> > better IOPS increase.
> > 2. Increase number of Queues to the number (e.g. 8) larger than the number 
> > of vCPU (e.g. 2) can get even better IOPS increase.
> 
> As mentioned in below link, when the number of hw queues is larger than
> nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids 
> queues
> (e.g., /sys/block/sda/mq/).
> 
> That is, when the num_queus=4 while vcpus is 2, there should be only 2 queues
> available /sys/block/sda/mq/
> 
> https://lore.kernel.org/lkml/address@hidden/
> 
> I am just curious how increasing the num_queues from 2 to 4 would double the
> iops, while there are only 2 vcpus available...

I don't know the answer.  It's especially hard to guess without seeing
the benchmark (fio?) configuration and QEMU command-line.

Common things to look at are:

1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
   patterns comparable?  blktrace(8) can give you even more detail on
   the exact I/O patterns.  If the guest and host have different I/O
   patterns (blocksize, IOPS, queue depth) then request merging or
   I/O scheduler effects could be responsible for the difference.

2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
   interrupt injections.  If these counters vary greatly between queue
   sizes, then that is usually a clue.  It's possible to get higher
   performance by spending more CPU cycles although your system doesn't
   have many CPUs available, so I'm not sure if this is the case.

3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
   and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
   goes into a low power mode due to idle.  There are several features
   that can keep the CPU awake or even poll so that request latency is
   reduced.  The reason why the number of queues may matter is that
   kicking multiple queues may keep the CPU awake more than batching
   multiple requests onto a small number of queues.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]