qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virt


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Wed, 06 Aug 2014 10:50:53 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Il 06/08/2014 10:38, Ming Lei ha scritto:
> On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini <address@hidden> wrote:
>> Il 06/08/2014 07:33, Ming Lei ha scritto:
>>>>> I played a bit with the following, I hope it's not too naive. I couldn't
>>>>> see a difference with your patches, but at least one reason for this is
>>>>> probably that my laptop SSD isn't fast enough to make the CPU the
>>>>> bottleneck. Haven't tried ramdisk yet, that would probably be the next
>>>>> thing. (I actually wrote the patch up just for some profiling on my own,
>>>>> not for comparing throughput, but it should be usable for that as well.)
>>> This might not be good for the test since it is basically a sequential
>>> read test, which can be optimized a lot by kernel. And I always use
>>> randread benchmark.
>>
>> A microbenchmark already exists in tests/test-coroutine.c, and doesn't
>> really tell us much; it's obvious that coroutines execute more code, the
>> question is why it affects the iops performance.
> 
> Could you take a look at the coroutine benchmark I worte?  The running
> result shows coroutine does decrease performance a lot compared with
> bypass coroutine like the patchset is doing.

Your benchmark is synchronous, while disk I/O is asynchronous.

Your benchmark doesn't add much compared to "time tests/test-coroutine
-m perf  -p /perf/yield".  It takes 8 seconds on my machine, and 10^8
function calls obviously take less than 8 seconds.  I've sent a patch to
add a "baseline" function call benchmark to test-coroutine.

>> The sequential read should be the right workload.  For fio, you want to
>> get as many iops as possible to QEMU and so you need randread.  But
>> qemu-img is not run in a guest and if the kernel optimizes sequential
>> reads then the bypass should have even more benefits because it makes
>> userspace proportionally more expensive.

Do you agree with this?

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]