qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode


From: Ming Lei
Subject: Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode
Date: Thu, 31 Jul 2014 18:06:00 +0800

On Thu, Jul 31, 2014 at 5:15 PM, Paolo Bonzini <address@hidden> wrote:
> Il 31/07/2014 10:59, Ming Lei ha scritto:
>>> > No guesses please.  Actually that's also my guess, but since you are
>>> > submitting the patch you must do better and show profiles where stack
>>> > switching disappears after the patches.
>> Follows the below hardware events reported by 'perf stat' when running
>> fio randread benchmark for 2min in VM(single vq, 2 jobs):
>>
>> sudo ~/bin/perf stat -e
>> L1-dcache-loads,L1-dcache-load-misses,cpu-cycles,instructions,branch-instructions,branch-misses,branch-loads,branch-load-misses,dTLB-loads,dTLB-load-misses
>> ./nqemu-start-mq 4 1
>>
>> 1), without bypassing coroutine via forcing to set 's->raw_format ' as
>> false, see patch 5/15
>>
>> - throughout: 95K
>>    232,564,905,115      instructions
>>      161.991075781 seconds time elapsed
>>
>>
>> 2), with bypassing coroutinue
>> - throughput: 115K
>>    255,526,629,881      instructions
>>      162.333465490 seconds time elapsed
>
> Ok, so you are saving 10% instructions per iop: before 232G / 95K =
> 2.45M instructions/iop, 255G / 115K = 2.22M instructions/iop.

I am wondering if it is useful since IOP is from view of VM, and instructions
is got from host(qemu). If qemu dataplane handles IO quickly enough,
it can save instructions by batch operations.

But I will collect the 'perf report' on 'instruction' event for you.

L1-dcache-load-misses ratio is increased by 1%, and instructions per
cycle is decreased by 10%(0.88->0.97), which is big too, and means
qemu becomes much slower when running block I/O in VM by
coroutine.

Thanks,



reply via email to

[Prev in Thread] Current Thread [Next in Thread]