qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net back


From: Anthony Liguori
Subject: [Qemu-devel] Re: [PATCH-RFC 13/13] virtio-net: connect to vhost net backend
Date: Mon, 25 Jan 2010 15:00:16 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091209 Fedora/3.0-4.fc12 Lightning/1.0pre Thunderbird/3.0

On 01/25/2010 02:27 PM, Michael S. Tsirkin wrote:
On Mon, Jan 25, 2010 at 01:58:08PM -0600, Anthony Liguori wrote:
On 01/11/2010 11:23 AM, Michael S. Tsirkin wrote:
start/stop backend on driver start/stop

Signed-off-by: Michael S. Tsirkin<address@hidden>
---
   hw/virtio-net.c |   40 ++++++++++++++++++++++++++++++++++++++++
   1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index c2a389f..99169e1 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -17,6 +17,7 @@
   #include "net/tap.h"
   #include "qemu-timer.h"
   #include "virtio-net.h"
+#include "vhost_net.h"

   #define VIRTIO_NET_VM_VERSION    11

@@ -47,6 +48,7 @@ typedef struct VirtIONet
       uint8_t nomulti;
       uint8_t nouni;
       uint8_t nobcast;
+    uint8_t vhost_started;
       struct {
           int in_use;
           int first_multi;
@@ -114,6 +116,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
       n->nomulti = 0;
       n->nouni = 0;
       n->nobcast = 0;
+    if (n->vhost_started) {
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        n->vhost_started = 0;
+    }

       /* Flush any MAC and VLAN filter table state */
       n->mac_table.in_use = 0;
@@ -820,6 +826,36 @@ static NetClientInfo net_virtio_info = {
       .link_status_changed = virtio_net_set_link_status,
   };

+static void virtio_net_set_status(struct VirtIODevice *vdev)
+{
+    VirtIONet *n = to_virtio_net(vdev);
+    if (!n->nic->nc.peer) {
+        return;
+    }
+    if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+        return;
+    }
+
+    if (!tap_get_vhost_net(n->nic->nc.peer)) {
+        return;
+    }
+    if (!!n->vhost_started == !!(vdev->status&   VIRTIO_CONFIG_S_DRIVER_OK)) {
+        return;
+    }
+    if (vdev->status&   VIRTIO_CONFIG_S_DRIVER_OK) {
+        int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        if (r<   0) {
+            fprintf(stderr, "unable to start vhost net: "
+                    "falling back on userspace virtio\n");
+        } else {
+            n->vhost_started = 1;
+        }
+    } else {
+        vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+        n->vhost_started = 0;
+    }
+}
+

This function does a number of bad things.  It makes virtio-net have
specific knowledge of backends (like tap) and then has virtio-net pass
device specific state (vdev) to a network backend.

Ultimately, the following things need to happen:

1) when a driver is ready to begin operating:
   a) virtio-net needs to tell vhost the location of the ring in physical
memory
   b) virtio-net needs to tell vhost about any notification it receives
(allowing kvm to shortcut userspace)
   c) virtio-net needs to tell vhost about which irq is being used
(allowing kvm to shortcut userspace)

What this suggests is that we need an API for the network backends to do
the following:

  1) probe whether ring passthrough is supported
  2) set the *virtual* address of the ring elements
  3) hand the backend some sort of notification object for sending and
receiving notifications
  4) stop ring passthrough
Look at how vnet_hdr is setup: frontend probes backend, and has 'if
(backend has vnet header) blabla' vhost is really very similar, so I
would like to do it in the same way.

vnet_hdr is IMHO a really bad example to copy from.

vnet_hdr leaks into the migration state via n->has_vnet_hdr. What this means is that if you want to migrate from -net tap -net nic,model=virtio to -net user -net nic,model=virtio, it will fail.

This is a hard problem to solve in qemu though because it would require that we implement software GSO which so far, no one has stepped up to do.

We don't have to repeat this with vhost-net though because unlike vnet_hdr, we don't have to expose anything to the guest.

Generally I do not believe in abstractions that only have one
implementation behind it: you only know how to abstract interface after you
have more than one implementation.  So whoever writes another frontend
that can use vhost will be in a better position to add infrastructure to
abstract both that new thing and virtio.

I agree with you, but at the same time, I also believe that layering violations should be avoided. I'm not suggesting that you need to make anything other than the vhost-net + virtio-net case work. Just make the interfaces abstract enough that you don't need to hand a vdev to vhost-net and such that you don't have to pass kvm specific data structure (ioeventfd) in what are supposed to be generic interfaces.

vhost should not need any direct knowledge of the device.  This
interface should be enough to communicating the required data.  I think
the only bit that is a little fuzzy and perhaps non-obvious for the
current patches is the notification object.  I would expect it look
something like:

typedef struct IOEvent {
   int type;
   void (*notify)(IOEvent *);
   void (*on_notify)(IOEvent *, void (*cb)(IOEvent *, void *));
} IOEvent;
And then we would have

typedef struct KVMIOEvent {
   IOEvent event = {.type = KVM_IO_EVENT};
   int fd;
} KVMIOEvent;

if (kvm_enabled()) in virtio-net, we would allocate a KVMIOEvent for the
PIO notification and hand that to vhost.  vhost would check event.type
and if it's KVM_IOEVENT, downcast and use that to get at the file
descriptor.
Since we don't have any other IOEvents, I just put the fd
in the generic structure directly. If other types surface
we'll see how to generalize it.

I'd feel a whole lot better if we didn't pass the fd around and instead passed around a structure. With just a tiny bit of layering, we can even avoid making that structure KVM specific :-)

The KVMIOEvent should be created, not in the set status callback, but in
the PCI map callback.  And in the PCI map callback, cpu_single_env
should be passed to a kvm specific function to create this KVMIOEvent
object that contains the created eventfd() that's handed to kvm via
ioctl.
So this specific thing does not work very well because with irqchip, we
want to bypass notification and send irq directly from kernel.
So I created a structure but it does not have callbacks,
only the fd.

Your structure is an int, right? I understand the limits due to lack of in-kernel irqchip but I still think passing around an fd is a mistake.

  There
should also be strong separation between the vhost-net code and the
virtio-net device.

Regards,

Anthony Liguori

I don't see the point of this last idea.  vhost is virtio accelerator,
not a generic network backend.  Whoever wants to use vhost for
non-virtio frontends will have to add infrastructure for this
separation, but I do not believe it's practical or desirable.

vhost is a userspace interface to inject packets into a network device just like a raw socket or a tun/tap device is. It happens to have some interesting features like it supports remapping of physical addresses and it implements interfaces for notifications that can conveniently be used by KVM to bypass userspace in the fast paths.

We should think of it this way for the same reason that vhost-net doesn't live in kvm.ko. If it's easy to isolate something and make it more generic, it's almost always the right thing to do. In this case, isolating vhost-net from virtio-net in qemu just involves passing some information in a function call verses passing a non-public data structure that is then accessed directly. Regardless of whether you agree with my view of vhost-net, the argument alone to avoid making a non-public structure public should be enough of an argument.

Regards,

Anthony Liguori




reply via email to

[Prev in Thread] Current Thread [Next in Thread]