qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?


From: Avi Kivity
Subject: Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
Date: Wed, 08 Oct 2014 13:59:13 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1


On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
Even more useful is getting rid of the desc array and instead passing descs
inline in avail and used.
You expect this to improve performance?
Quite possibly but this will have to be demonstrated.

The top vhost function in small packet workloads is vhost_get_vq_desc, and
the top instruction within that (50%) is the one that reads the first 8
bytes of desc.  It's a guaranteed cache line miss (and again on the guest
side when it's time to reuse).
OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.

Right.  And only read of descriptor is not amortized.

If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.

That only works if you don't allow reordering, which is never the case for block, and not the case for zero-copy net. It also has writers on both side of the ring.

The right design is to keep avail and used, but instead of making them rings of pointers to descs, make them rings of descs.

The host reads descs from avail, processes them, then writes them back on used (possibly out-of-order). The guest writes descs to avail and reads them back from used.

You'll probably have to add a 64-bit cookie to desc so you can complete without an additional lookup.


Sounds good in theory.

Inline descriptors will amortize the cache miss over 4 descriptors, and will
allow the hardware to prefetch, since the descriptors are linear in memory.
If descriptors are used in order (as they are with current qemu)
then aren't they amortized already?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]