[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virt
From: |
Ming Lei |
Subject: |
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support |
Date: |
Wed, 6 Aug 2014 13:33:36 +0800 |
Hi Kevin,
On Tue, Aug 5, 2014 at 10:47 PM, Kevin Wolf <address@hidden> wrote:
> Am 05.08.2014 um 15:48 hat Stefan Hajnoczi geschrieben:
>> On Tue, Aug 05, 2014 at 06:00:22PM +0800, Ming Lei wrote:
>> > On Tue, Aug 5, 2014 at 5:48 PM, Kevin Wolf <address@hidden> wrote:
>> > > Am 05.08.2014 um 05:33 hat Ming Lei geschrieben:
>> > >> Hi,
>> > >>
>> > >> These patches bring up below 4 changes:
>> > >> - introduce object allocation pool and apply it to
>> > >> virtio-blk dataplane for improving its performance
>> > >>
>> > >> - introduce selective coroutine bypass mechanism
>> > >> for improving performance of virtio-blk dataplane with
>> > >> raw format image
>> > >
>> > > Before applying any bypassing patches, I think we should understand in
>> > > detail where we are losing performance with coroutines enabled.
>> >
>> > From the below profiling data, CPU becomes slow to run instructions
>> > with coroutine, and CPU dcache miss is increased so it is very
>> > likely caused by switching stack frequently.
>> >
>> > http://marc.info/?l=qemu-devel&m=140679721126306&w=2
>> >
>> > http://pastebin.com/ae0vnQ6V
>>
>> I have been wondering how to prove that the root cause is the ucontext
>> coroutine mechanism (stack switching). Here is an idea:
>>
>> Hack your "bypass" code path to run the request inside a coroutine.
>> That way you can compare "bypass without coroutine" against "bypass with
>> coroutine".
>>
>> Right now I think there are doubts because the bypass code path is
>> indeed a different (and not 100% correct) code path. So this approach
>> might prove that the coroutines are adding the overhead and not
>> something that you bypassed.
>
> My doubts aren't only that the overhead might not come from the
> coroutines, but also whether any coroutine-related overhead is really
> unavoidable. If we can optimise coroutines, I'd strongly prefer to do
> just that instead of introducing additional code paths.
OK, thank you for taking look at the problem, and hope we can
figure out the root cause, :-)
>
> Another thought I had was this: If the performance difference is indeed
> only coroutines, then that is completely inside the block layer and we
> don't actually need a VM to test it. We could instead have something
> like a simple qemu-img based benchmark and should be observing the same.
Even it is simpler to run a coroutine-only benchmark, and I just
wrote a raw one, and looks coroutine does decrease performance
a lot, please see the attachment patch, and thanks for your template
to help me add the 'co_bench' command in qemu-img.
>From the profiling data in below link:
http://pastebin.com/YwH2uwbq
With coroutine, the running time for same loading is increased
~50%(1.325s vs. 0.903s), and dcache load events is increased
~35%(693M vs. 512M), insns per cycle is decreased by ~50%(
1.35 vs. 1.63), compared with bypassing coroutine(-b parameter).
The bypass code in the benchmark is very similar with the approach
used in the bypass patch, since linux-aio with O_DIRECT seldom
blocks in the the kernel I/O path.
Maybe the benchmark is a bit extremely, but given modern storage
device may reach millions of IOPS, and it is very easy to slow down
the I/O by coroutine.
> I played a bit with the following, I hope it's not too naive. I couldn't
> see a difference with your patches, but at least one reason for this is
> probably that my laptop SSD isn't fast enough to make the CPU the
> bottleneck. Haven't tried ramdisk yet, that would probably be the next
> thing. (I actually wrote the patch up just for some profiling on my own,
> not for comparing throughput, but it should be usable for that as well.)
This might not be good for the test since it is basically a sequential
read test, which can be optimized a lot by kernel. And I always use
randread benchmark.
Thanks,
co_bench.patch
Description: Text Data
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, (continued)
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Kevin Wolf, 2014/08/05
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/05
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Paolo Bonzini, 2014/08/05
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Stefan Hajnoczi, 2014/08/05
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Kevin Wolf, 2014/08/05
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support,
Ming Lei <=
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Paolo Bonzini, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Paolo Bonzini, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Kevin Wolf, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Kevin Wolf, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Ming Lei, 2014/08/06
- Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support, Kevin Wolf, 2014/08/06