qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread


From: Wei Li
Subject: Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
Date: Tue, 16 Apr 2019 18:42:10 -0700
User-agent: Microsoft-MacOutlook/10.15.0.190115

Thanks Stefan and Dongli for your feedback and advices!

I will do the further investigation per your advices and get back to you later 
on.

Thanks, 
-Wei

On 4/16/19, 2:20 AM, "Stefan Hajnoczi" <address@hidden> wrote:

    On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
    > 
    > 
    > On 4/16/19 1:34 AM, Wei Li wrote:
    > > Hi @Paolo Bonzini & @Stefan Hajnoczi,
    > > 
    > > Would you please help confirm whether @Paolo Bonzini's multiqueue 
feature change will benefit virtio-scsi or not? Thanks!
    > > 
    > > @Stefan Hajnoczi,
    > > I also spent some time on exploring the virtio-scsi multi-queue 
features via num_queues parameter as below, here are what we found:
    > > 
    > > 1. Increase number of Queues from one to the same number as CPU will 
get better IOPS increase.
    > > 2. Increase number of Queues to the number (e.g. 8) larger than the 
number of vCPU (e.g. 2) can get even better IOPS increase.
    > 
    > As mentioned in below link, when the number of hw queues is larger than
    > nr_cpu_ids, the blk-mq layer would limit and only use at most nr_cpu_ids 
queues
    > (e.g., /sys/block/sda/mq/).
    > 
    > That is, when the num_queus=4 while vcpus is 2, there should be only 2 
queues
    > available /sys/block/sda/mq/
    > 
    > https://lore.kernel.org/lkml/address@hidden/
    > 
    > I am just curious how increasing the num_queues from 2 to 4 would double 
the
    > iops, while there are only 2 vcpus available...
    
    I don't know the answer.  It's especially hard to guess without seeing
    the benchmark (fio?) configuration and QEMU command-line.
    
    Common things to look at are:
    
    1. Compare "iostat -dx 1" inside the guest and host.  Are the I/O
       patterns comparable?  blktrace(8) can give you even more detail on
       the exact I/O patterns.  If the guest and host have different I/O
       patterns (blocksize, IOPS, queue depth) then request merging or
       I/O scheduler effects could be responsible for the difference.
    
    2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
       interrupt injections.  If these counters vary greatly between queue
       sizes, then that is usually a clue.  It's possible to get higher
       performance by spending more CPU cycles although your system doesn't
       have many CPUs available, so I'm not sure if this is the case.
    
    3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
       and QEMU iothread poll-max-ns).  It's expensive to wake a CPU when it
       goes into a low power mode due to idle.  There are several features
       that can keep the CPU awake or even poll so that request latency is
       reduced.  The reason why the number of queues may matter is that
       kicking multiple queues may keep the CPU awake more than batching
       multiple requests onto a small number of queues.
    
    Stefan
    





reply via email to

[Prev in Thread] Current Thread [Next in Thread]