[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread
From: |
Wei Li |
Subject: |
Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread |
Date: |
Mon, 22 Apr 2019 21:21:53 -0700 |
User-agent: |
Microsoft-MacOutlook/10.15.0.190115 |
Hi Stefan,
I did investigation per your advices, please see inline for the details and
questions.
1. Compare "iostat -dx 1" inside the guest and host. Are the I/O
patterns comparable? blktrace(8) can give you even more detail on
the exact I/O patterns. If the guest and host have different I/O
patterns (blocksize, IOPS, queue depth) then request merging or
I/O scheduler effects could be responsible for the difference.
[wei]: That's good point, I compared 'iostate -dx1" between guest and host, but
I have not find obvious difference between guest and host which could
responsible for the difference.
2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
interrupt injections. If these counters vary greatly between queue
sizes, then that is usually a clue. It's possible to get higher
performance by spending more CPU cycles although your system doesn't
have many CPUs available, so I'm not sure if this is the case.
[wei]: vmexits looks like a reason. I am using FIO tool to read/write block
storage via following sample command, interesting thing is that kvm:kvm_exit
count decreased from 846K to 395K after I increased num_queues from 2 to 4
while the vCPU count is 2.
1). Does this mean using more queues than vCPU count may increase
IOPS via spending more CPU cycle?
2). Could you please help me better understand how more queues is
able to spend more CPU cycle? Thanks!
FIO command: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k
--ioengine=libaio --iodepth=64 --numjobs=4 --time_based --group_reporting
--name=iops --runtime=60 --eta-newline=1
3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
and QEMU iothread poll-max-ns). It's expensive to wake a CPU when it
goes into a low power mode due to idle. There are several features
that can keep the CPU awake or even poll so that request latency is
reduced. The reason why the number of queues may matter is that
kicking multiple queues may keep the CPU awake more than batching
multiple requests onto a small number of queues.
[wei]: CPU awake could be another reason, I noticed that kvm:kvm_vcpu_wakeup
count decreased from 151K to 47K after I increased num_queues from 2 to 4 while
the vCPU count is 2.
1). Does this mean more queues may keep CPU more busy and awake
which reduced the vcpu wakeup time?
2). If using more num_queues than vCPU count is able to get higher
IOPS for this case, is it safe to use 4 queues while it only have 2 vCPU, or
there is any concern or impact by using more queues than vCPU count which I
should keep in mind?
In addition, does Virtio-scsi support Batch I/O Submission feature which may be
able to increase the IOPS via reducing the number of system calls?
Thanks,
Wei
On 4/16/19, 6:42 PM, "Wei Li" <address@hidden> wrote:
Thanks Stefan and Dongli for your feedback and advices!
I will do the further investigation per your advices and get back to you
later on.
Thanks,
-Wei
On 4/16/19, 2:20 AM, "Stefan Hajnoczi" <address@hidden> wrote:
On Tue, Apr 16, 2019 at 07:23:38AM +0800, Dongli Zhang wrote:
>
>
> On 4/16/19 1:34 AM, Wei Li wrote:
> > Hi @Paolo Bonzini & @Stefan Hajnoczi,
> >
> > Would you please help confirm whether @Paolo Bonzini's multiqueue
feature change will benefit virtio-scsi or not? Thanks!
> >
> > @Stefan Hajnoczi,
> > I also spent some time on exploring the virtio-scsi multi-queue
features via num_queues parameter as below, here are what we found:
> >
> > 1. Increase number of Queues from one to the same number as CPU
will get better IOPS increase.
> > 2. Increase number of Queues to the number (e.g. 8) larger than the
number of vCPU (e.g. 2) can get even better IOPS increase.
>
> As mentioned in below link, when the number of hw queues is larger
than
> nr_cpu_ids, the blk-mq layer would limit and only use at most
nr_cpu_ids queues
> (e.g., /sys/block/sda/mq/).
>
> That is, when the num_queus=4 while vcpus is 2, there should be only
2 queues
> available /sys/block/sda/mq/
>
> https://lore.kernel.org/lkml/address@hidden/
>
> I am just curious how increasing the num_queues from 2 to 4 would
double the
> iops, while there are only 2 vcpus available...
I don't know the answer. It's especially hard to guess without seeing
the benchmark (fio?) configuration and QEMU command-line.
Common things to look at are:
1. Compare "iostat -dx 1" inside the guest and host. Are the I/O
patterns comparable? blktrace(8) can give you even more detail on
the exact I/O patterns. If the guest and host have different I/O
patterns (blocksize, IOPS, queue depth) then request merging or
I/O scheduler effects could be responsible for the difference.
2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and
interrupt injections. If these counters vary greatly between queue
sizes, then that is usually a clue. It's possible to get higher
performance by spending more CPU cycles although your system doesn't
have many CPUs available, so I'm not sure if this is the case.
3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles,
and QEMU iothread poll-max-ns). It's expensive to wake a CPU when it
goes into a low power mode due to idle. There are several features
that can keep the CPU awake or even poll so that request latency is
reduced. The reason why the number of queues may matter is that
kicking multiple queues may keep the CPU awake more than batching
multiple requests onto a small number of queues.
Stefan
Message not available
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Wei Li, 2019/04/15
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Dongli Zhang, 2019/04/15
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Stefan Hajnoczi, 2019/04/16
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Wei Li, 2019/04/16
- Message not available
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread,
Wei Li <=
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Stefan Hajnoczi, 2019/04/23
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Paolo Bonzini, 2019/04/26
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Wei Li, 2019/04/26
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Paolo Bonzini, 2019/04/27
- Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Wei Li, 2019/04/29
Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Stefan Hajnoczi, 2019/04/29
Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Wei Li, 2019/04/29
Re: [Qemu-devel] Following up questions related to QEMU and I/O Thread, Paolo Bonzini, 2019/04/30