[Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net back

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net back

From:	Anthony Liguori
Subject:	[Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Date:	Mon, 25 Jan 2010 15:00:16 -0600
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091209 Fedora/3.0-4.fc12 Lightning/1.0pre Thunderbird/3.0

On 01/25/2010 02:27 PM, Michael S. Tsirkin wrote:

On Mon, Jan 25, 2010 at 01:58:08PM -0600, Anthony Liguori wrote:

On 01/11/2010 11:23 AM, Michael S. Tsirkin wrote:

start/stop backend on driver start/stop

Signed-off-by: Michael S. Tsirkin<address@hidden>
---
   hw/virtio-net.c |   40 ++++++++++++++++++++++++++++++++++++++++
   1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index c2a389f..99169e1 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -17,6 +17,7 @@
   #include "net/tap.h"
   #include "qemu-timer.h"
   #include "virtio-net.h"
+#include "vhost_net.h"

   #define VIRTIO_NET_VM_VERSION    11

@@ -47,6 +48,7 @@ typedef struct VirtIONet
       uint8_t nomulti;
       uint8_t nouni;
       uint8_t nobcast;
+    uint8_t vhost_started;
       struct {
           int in_use;
           int first_multi;
@@ -114,6 +116,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
       n->nomulti = 0;
       n->nouni = 0;
       n->nobcast = 0;
+    if (n->vhost_started) {
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        n->vhost_started = 0;
+    }

       /* Flush any MAC and VLAN filter table state */
       n->mac_table.in_use = 0;
@@ -820,6 +826,36 @@ static NetClientInfo net_virtio_info = {
       .link_status_changed = virtio_net_set_link_status,
   };

+static void virtio_net_set_status(struct VirtIODevice *vdev)
+{
+    VirtIONet *n = to_virtio_net(vdev);
+    if (!n->nic->nc.peer) {
+        return;
+    }
+    if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+        return;
+    }
+
+    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+        return;
+    }
+    if (!!n->vhost_started == !!(vdev->status&   VIRTIO_CONFIG_S_DRIVER_OK)) {
+        return;
+    }
+    if (vdev->status&   VIRTIO_CONFIG_S_DRIVER_OK) {
+        int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        if (r<   0) {
+            fprintf(stderr, "unable to start vhost net: "
+                    "falling back on userspace virtio\n");
+        } else {
+            n->vhost_started = 1;
+        }
+    } else {
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        n->vhost_started = 0;
+    }
+}
+

This function does a number of bad things.  It makes virtio-net have
specific knowledge of backends (like tap) and then has virtio-net pass
device specific state (vdev) to a network backend.

Ultimately, the following things need to happen:

1) when a driver is ready to begin operating:
   a) virtio-net needs to tell vhost the location of the ring in physical
memory
   b) virtio-net needs to tell vhost about any notification it receives
(allowing kvm to shortcut userspace)
   c) virtio-net needs to tell vhost about which irq is being used
(allowing kvm to shortcut userspace)

What this suggests is that we need an API for the network backends to do
the following:

  1) probe whether ring passthrough is supported
  2) set the *virtual* address of the ring elements
  3) hand the backend some sort of notification object for sending and
receiving notifications
  4) stop ring passthrough

Look at how vnet_hdr is setup: frontend probes backend, and has 'if
(backend has vnet header) blabla' vhost is really very similar, so I
would like to do it in the same way.


vnet_hdr is IMHO a really bad example to copy from.

vnet_hdr leaks into the migration state via n->has_vnet_hdr. What thismeans is that if you want to migrate from -net tap -net nic,model=virtioto -net user -net nic,model=virtio, it will fail.

This is a hard problem to solve in qemu though because it would requirethat we implement software GSO which so far, no one has stepped up to do.

We don't have to repeat this with vhost-net though because unlikevnet_hdr, we don't have to expose anything to the guest.

Generally I do not believe in abstractions that only have one
implementation behind it: you only know how to abstract interface after you
have more than one implementation.  So whoever writes another frontend
that can use vhost will be in a better position to add infrastructure to
abstract both that new thing and virtio.

I agree with you, but at the same time, I also believe that layeringviolations should be avoided. I'm not suggesting that you need to makeanything other than the vhost-net + virtio-net case work. Just make theinterfaces abstract enough that you don't need to hand a vdev tovhost-net and such that you don't have to pass kvm specific datastructure (ioeventfd) in what are supposed to be generic interfaces.

vhost should not need any direct knowledge of the device.  This
interface should be enough to communicating the required data.  I think
the only bit that is a little fuzzy and perhaps non-obvious for the
current patches is the notification object.  I would expect it look
something like:

typedef struct IOEvent {
   int type;
   void (*notify)(IOEvent *);
   void (*on_notify)(IOEvent *, void (*cb)(IOEvent *, void *));
} IOEvent;
And then we would have

typedef struct KVMIOEvent {
   IOEvent event = {.type = KVM_IO_EVENT};
   int fd;
} KVMIOEvent;

if (kvm_enabled()) in virtio-net, we would allocate a KVMIOEvent for the
PIO notification and hand that to vhost.  vhost would check event.type
and if it's KVM_IOEVENT, downcast and use that to get at the file
descriptor.

Since we don't have any other IOEvents, I just put the fd
in the generic structure directly. If other types surface
we'll see how to generalize it.

I'd feel a whole lot better if we didn't pass the fd around and insteadpassed around a structure. With just a tiny bit of layering, we caneven avoid making that structure KVM specific :-)

The KVMIOEvent should be created, not in the set status callback, but in
the PCI map callback.  And in the PCI map callback, cpu_single_env
should be passed to a kvm specific function to create this KVMIOEvent
object that contains the created eventfd() that's handed to kvm via
ioctl.

So this specific thing does not work very well because with irqchip, we
want to bypass notification and send irq directly from kernel.
So I created a structure but it does not have callbacks,
only the fd.

Your structure is an int, right? I understand the limits due to lack ofin-kernel irqchip but I still think passing around an fd is a mistake.

  There
should also be strong separation between the vhost-net code and the
virtio-net device.

Regards,

Anthony Liguori

I don't see the point of this last idea.  vhost is virtio accelerator,
not a generic network backend.  Whoever wants to use vhost for
non-virtio frontends will have to add infrastructure for this
separation, but I do not believe it's practical or desirable.

vhost is a userspace interface to inject packets into a network devicejust like a raw socket or a tun/tap device is. It happens to have someinteresting features like it supports remapping of physical addressesand it implements interfaces for notifications that can conveniently beused by KVM to bypass userspace in the fast paths.

We should think of it this way for the same reason that vhost-netdoesn't live in kvm.ko. If it's easy to isolate something and make itmore generic, it's almost always the right thing to do. In this case,isolating vhost-net from virtio-net in qemu just involves passing someinformation in a function call verses passing a non-public datastructure that is then accessed directly. Regardless of whether youagree with my view of vhost-net, the argument alone to avoid making anon-public structure public should be enough of an argument.


Regards,

Anthony Liguori

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH-RFC 13/13] virtio-net: connect to vhost net backend, Michael S. Tsirkin, 2010/01/11
- [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend, Anthony Liguori, 2010/01/25
  - [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend, Michael S. Tsirkin, 2010/01/25
    - [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend, Anthony Liguori <=
    - [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend, Michael S. Tsirkin, 2010/01/25
    - [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend, Michael S. Tsirkin, 2010/01/25
    - [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend, Anthony Liguori, 2010/01/25

Prev by Date: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Next by Date: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Previous by thread: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Next by thread: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Index(es):
- Date
- Thread