qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virt


From: Ming Lei
Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Date: Thu, 14 Aug 2014 18:12:03 +0800

On Thu, Aug 14, 2014 at 5:39 PM, Stefan Hajnoczi <address@hidden> wrote:
> On Wed, Aug 13, 2014 at 09:49:23PM +0800, Ming Lei wrote:
>> On Wed, Aug 13, 2014 at 9:16 PM, Paolo Bonzini <address@hidden> wrote:
>> > Il 13/08/2014 11:54, Kevin Wolf ha scritto:
>> >> Am 12.08.2014 um 21:08 hat Paolo Bonzini geschrieben:
>> >>> Il 12/08/2014 10:12, Ming Lei ha scritto:
>> >>>>>> The below patch is basically the minimal change to bypass coroutines. 
>> >>>>>>  Of course
>> >>>>>> the block.c part is not acceptable as is (the change to 
>> >>>>>> refresh_total_sectors
>> >>>>>> is broken, the others are just ugly), but it is a start.  Please run 
>> >>>>>> it with
>> >>>>>> your fio workloads, or write an aio-based version of a 
>> >>>>>> qemu-img/qemu-io *I/O*
>> >>>>>> benchmark.
>> >>>> Could you explain why the new change is introduced?
>> >>>
>> >>> It provides a fast path for bdrv_aio_readv/writev whenever there is
>> >>> nothing to do after the driver routine returns.  In this case there is
>> >>> no need to wrap the AIOCB returned by the driver routine.
>> >>>
>> >>> It doesn't go all the way, and in particular it doesn't reverse
>> >>> completely the roles of bdrv_co_readv/writev vs. bdrv_aio_readv/writev.
>> >>
>> >> That's actually why I think it's an option. Remember that, like you say
>> >> below, we're optimising for an extreme case here, and I certainly don't
>> >> want to hurt the common case for it. I can't imagine a way of reversing
>> >> the roles without multiplying the cost for the coroutine path.
>> >
>> > I'm not that worried about it.  Perhaps it's enough to add an
>> > !qemu_in_coroutine() to the AIO fast path, and let the driver provide
>> > optimized coroutine paths like in your patches that allocate AIOCBs on
>> > the stack.
>>
>> IMO, it will not be a extreme case as SSD or high performance storage
>> becomes more popular, coroutine starts to affect performance if IOPS
>> is more than 100K, as previous computation.
>
> The case you seem to care about is raw images on high IOPS devices.  You
> mentioned 1M IOPS devices in another email.

In reality, if someone cares about high IOPS, looks raw format has to be
considered.

>
> You don't seem to want QEMU's block layer features, that is why you are
> trying to bypass them instead of optimizing the block layer.

I don't think bypassing coroutin isn't in opposite side of optimizing
block layer.

As we know, coroutine always introduces some cost which can't
be ignored for high IOPS device. If coroutine can be improved to
fit in the case, I'd like to help do that, but I am wondering it is doable.

I like rich features, and I like good performance too, and they two
shouldn't  be contrary, and block layer should be flexible to support
both.

> That begs the question whether you should look at PCI passthrough
> instead?

I am wondering why you raise this question, it is said that virtio-blk may be
one of the most fast block device in VM world, so it is worth the optimization.
Also it can support live migration compared with passthrough.

Thanks,



reply via email to

[Prev in Thread] Current Thread [Next in Thread]