qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code
Date: Wed, 5 Dec 2012 13:57:35 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Nov 29, 2012 at 02:50:01PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 22, 2012 at 04:16:45PM +0100, Stefan Hajnoczi wrote:
> > The virtio-blk-data-plane cannot access memory using the usual QEMU
> > functions since it executes outside the global mutex and the memory APIs
> > are this time are not thread-safe.
> > 
> > This patch introduces a virtqueue module based on the kernel's vhost
> > vring code.  The trick is that we map guest memory ahead of time and
> > access it cheaply outside the global mutex.
> > 
> > Once the hardware emulation code can execute outside the global mutex it
> > will be possible to drop this code.
> > 
> > Signed-off-by: Stefan Hajnoczi <address@hidden>
> 
> Is there no way to factor out ommon code and share it with virtio.c?

I think we have touched on this in other sub-threads but for reference:
this code implements vring access outside the global mutex, which means
no QEMU memory API functions.  Therefore it's hard to share the virtio.c
code which uses QEMU memory API functions.

The current work that Ping Fan Liu is doing will lead to thread-safe
memory accesses from device emulation code.  At that point we can ditch
this and unify with virtio.c.

> > +/* This is stolen from linux-2.6/drivers/vhost/vhost.c. */
> 
> Probably should document the version you based this on.
> Surely not really 2.6?

linux-2.6.git is still mirrored from linux.git :).  I'll try to dig up
the specific Linux version that this code is based on.

> > +static int get_indirect(Vring *vring,
> > +                        struct iovec iov[], struct iovec *iov_end,
> > +                        unsigned int *out_num, unsigned int *in_num,
> > +                        struct vring_desc *indirect)
> > +{
> > +    struct vring_desc desc;
> > +    unsigned int i = 0, count, found = 0;
> > +
> > +    /* Sanity check */
> > +    if (unlikely(indirect->len % sizeof desc)) {
> > +        error_report("Invalid length in indirect descriptor: "
> > +                     "len %#x not multiple of %#zx",
> > +                     indirect->len, sizeof desc);
> > +        vring->broken = true;
> > +        return -EFAULT;
> > +    }
> > +
> > +    count = indirect->len / sizeof desc;
> > +    /* Buffers are chained via a 16 bit next field, so
> > +     * we can have at most 2^16 of these. */
> > +    if (unlikely(count > USHRT_MAX + 1)) {
> > +        error_report("Indirect buffer length too big: %d", indirect->len);
> > +        vring->broken = true;
> > +        return -EFAULT;
> > +    }
> > +
> > +    /* Point to translate indirect desc chain */
> > +    indirect = hostmem_lookup(&vring->hostmem, indirect->addr, 
> > indirect->len,
> > +                              false);
> 
> This assumes an indirect buffer is contigious in qemu memory
> which seems wrong since unlike vring itself
> there are no alignment requirements.

Let's break this up into one hostmem_lookup() per descriptor.  In other
words, don't try to lookup the entire indirect buffer but copy-in one
descriptor at a time.

> Overriding indirect here also seems unnecessarily tricky.

You are right, let's use a separate local variable to make the code
clearer.

> > +int vring_pop(VirtIODevice *vdev, Vring *vring,
> > +              struct iovec iov[], struct iovec *iov_end,
> > +              unsigned int *out_num, unsigned int *in_num)
> > +{
> > +    struct vring_desc desc;
> > +    unsigned int i, head, found = 0, num = vring->vr.num;
> > +    uint16_t avail_idx, last_avail_idx;
> > +
> > +    /* If there was a fatal error then refuse operation */
> > +    if (vring->broken) {
> > +        return -EFAULT;
> > +    }
> > +
> > +    /* Check it isn't doing very strange things with descriptor numbers. */
> > +    last_avail_idx = vring->last_avail_idx;
> > +    avail_idx = vring->vr.avail->idx;
> 
> I think something needs to be done here to force
> a read otherwise two accesses to avail_idx
> below can cause two reads from the ring and
> could return inconsistent results.

There is no function call or anything in between that forces the
compiler to load the value of avail_idx and reuse it.

So I think you're right.  I'm not 100% sure a read barrier forces the
compiler to load here since the following code just manipulates
last_avail_idx and avail_idx.

Any ideas?

> > +    if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
> > +        vring_avail_event(&vring->vr) = vring->vr.avail->idx;
> 
> No barrier here?
> I also don't see similar code in vhost - why is it a good idea?

This is from hw/virtio.c:virtqueue_pop().  We know there is at least one
request available, we're hinting to the guest to not to bother
notifying any requests up to the latest.

However, setting avail_event to the current vring avail_idx only helps
if we get lucky and process the vring *before* the guest decides to
notify a whole bunch of requests it has just enqueued.

So this doesn't seem incorrect but the performance benefit is
questionable.

Do you remember why you wrote this code?  The commit is:

commit bcbabae8ff7f7ec114da9fe2aa7f25f420f35306
Author: Michael S. Tsirkin <address@hidden>
Date:   Sun Jun 12 16:21:57 2011 +0300

    virtio: event index support

    Add support for event_idx feature, and utilize it to
    reduce the number of interrupts and exits for the guest.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]