qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virt


From: Ming Lei
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Wed, 13 Aug 2014 18:19:38 +0800

On Wed, Aug 13, 2014 at 3:08 AM, Paolo Bonzini <address@hidden> wrote:
> Il 12/08/2014 10:12, Ming Lei ha scritto:
>>> > The below patch is basically the minimal change to bypass coroutines.  Of 
>>> > course
>>> > the block.c part is not acceptable as is (the change to 
>>> > refresh_total_sectors
>>> > is broken, the others are just ugly), but it is a start.  Please run it 
>>> > with
>>> > your fio workloads, or write an aio-based version of a qemu-img/qemu-io 
>>> > *I/O*
>>> > benchmark.
>> Could you explain why the new change is introduced?
>
> It provides a fast path for bdrv_aio_readv/writev whenever there is
> nothing to do after the driver routine returns.  In this case there is
> no need to wrap the AIOCB returned by the driver routine.
>
> It doesn't go all the way, and in particular it doesn't reverse
> completely the roles of bdrv_co_readv/writev vs. bdrv_aio_readv/writev.
>  But it is enough to provide something that is not dataplane-specific,
> does not break various functionality that we need to add to dataplane
> virtio-blk, does not mess up the semantics of the block layer, and lets
> you run benchmarks.
>
>> I will hold it until we can align to the coroutine cost computation,
>> because it is very important for the discussion.
>
> First of all, note that the coroutine cost is totally pointless in the
> discussion unless you have 100% CPU time and the dataplane thread
> becomes CPU bound.  You haven't said if this is the case.

No, it does make sense, especially for high speed block device.

In my test, the CPU is close to 100%, otherwise block throughput
should not have been effected.

Also it can decrease CPU utilization if it isn't 100% CPU.

Also it is related with CPU speed, in one slow machine, running
coroutine may introduce some load especially for high IOPS
block device.

>
> Second, if the coroutine cost is relevant, the profile is really too

I have wrote a patch to figure out coroutine cost which can show
it clearly, and you should be in the Cc list.

> flat to do much about it.  The only solution (and here I *think* I
> disagree slightly with Kevin) is to get rid of it, which is not even too
> hard to do.

I agree.

But it depends on the situation of coroutine use. If the function running
by coroutine isn't called very frequently, the effect from coroutine can
be ignored.

For block device which can reach hundreds of kilo IOPS, as far
as I thought of, the only solution is to not using coroutine for this case.

That is why I wrote the bypass coroutine patch.

>
> The problem is that your patches to do touch too much code and subtly
> break too much stuff.  The one I wrote does have a little breakage

Could you give a hint about which stuff are broken? Last time, you mention
virtio-scsi need to keep AIOCB live after returning, I have fixed it in V1.

> because I don't understand bs->growable 100% and I didn't really put
> much effort into it (my deadline being basically "be done as soon as the
> shower is free"), and it is ugly as hell, _but_ it should be compatible
> with the way the block layer works.

I will take a careful look to your patch later.

If coroutine is still there, I think it still can slow down performance.

Thanks,



reply via email to

[Prev in Thread] Current Thread [Next in Thread]