qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Faster, generic IO/DMA model with vectored AIO?


From: Paul Brook
Subject: Re: [Qemu-devel] Faster, generic IO/DMA model with vectored AIO?
Date: Sun, 28 Oct 2007 02:29:09 +0100
User-agent: KMail/1.9.7

> I changed Slirp output to use vectored IO to avoid the slowdown from
> memcpy (see the patch for the work in progress, gives a small
> performance improvement). But then I got the idea that using AIO would
> be nice at the outgoing end of the network IO processing. In fact,
> vectored AIO model could even be used for the generic DMA! The benefit
> is that no buffering or copying should be needed.

An interesting idea, however I don't want to underestimate the difficulty of 
implementing this correctly.  I suspect to get real benefits you need to 
support zero-copy async operation all the way through.  Things get really 
hairy if you allow some operations to complete synchronously, and some to be 
deferred. 

I've done async operation for SCSI and USB. The latter is really not pretty, 
and the former has some notable warts. A generic IODMA framework needs to 
make sure it covers these requirements without making things worse. Hopefully 
it'll also help fix the things that are wrong with them.

> For the specific Sparc32 case, unfortunately Lance bus byte swapping
> makes buffering necessary at that stage, unless we can make N vectors
> with just a single byte faster than memcpy + bswap of memory block
> with size N.

We really want to be dealing with largeish blocks. The {ptr,size} vector is 64 
or 128 bytes per element, so the overhead on blocks < 64 bytes if going to be 
really brutal. Also time taken to do address translation will be O(number of 
vectors).

> Inside Qemu the vectors would use target physical addresses (struct
> qemu_iovec), but at some point the addresses would change to host
> pointers suitable for real AIO.

Phrases like "at some point" worry me :-)

I think it would be good to get a top-down description of what each different 
entity (initiating device, host endpoint, bus translation, memory) is 
responsible for, and how they all fit together.


I have some ideas, but without more detailed investigation can't tell if they 
will actually work in practice, or if they fit into the code fragments you've 
posted. My suspicion is they don't as I can't make head or tail of how your 
gdma_aiov.diff patch would be used in practice.

Paul




reply via email to

[Prev in Thread] Current Thread [Next in Thread]