qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost bac


From: Jason Wang
Subject: Re: [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
Date: Tue, 6 Nov 2018 12:17:48 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1


On 2018/10/16 下午9:23, Xiao Wang wrote:
What's this
===========
Following the patch (vhost: introduce mdev based hardware vhost backend)
https://lwn.net/Articles/750770/, which defines a generic mdev device for
vhost data path acceleration (aliased as vDPA mdev below), this patch set
introduces a new net client type: vhost-vfio.


Thanks a lot for a such interesting series. Some generic questions:


If we consider to use software backend (e.g vhost-kernel or a rely of virito-vhost-user or other cases) as well in the future, maybe vhost-mdev is better which mean it does not tie to VFIO anyway.



Currently we have 2 types of vhost backends in QEMU: vhost kernel (tap)
and vhost-user (e.g. DPDK vhost), in order to have a kernel space HW vhost
acceleration framework, the vDPA mdev device works as a generic configuring
channel.


Does "generic" configuring channel means dpdk will also go for this way? E.g it will have a vhost mdev pmd?


  It exposes to user space a non-vendor-specific configuration
interface for setting up a vhost HW accelerator,


Or even a software translation layer on top of exist hardware.


based on this, this patch
set introduces a third vhost backend called vhost-vfio.

How does it work
================
The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
device interface, vhost messages can be written to or read from this
region following below format. All the regular vhost messages about vring
addr, negotiated features, etc., are written to this region directly.


If I understand this correctly, the mdev was not used for passed through to guest directly. So what's the reason of inventing a PCI like device here? I'm asking since:

- vhost protocol is transport indepedent, we should consider to support transport other than PCI. I know we can even do it with the exist design but it looks rather odd if we do e.g ccw device with a PCI like mediated device.

- can we try to reuse vhost-kernel ioctl? Less API means less bugs and code reusing. E.g virtio-user can benefit from the vhost kernel ioctl API almost with no changes I believe.



struct vhost_vfio_op {
        __u64 request;
        __u32 flags;
        /* Flag values: */
#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
        __u32 size;
        union {
                __u64 u64;
                struct vhost_vring_state state;
                struct vhost_vring_addr addr;
                struct vhost_memory memory;
        } payload;
};

BAR1 is defined to be a region of doorbells, QEMU can use this region as
host notifier for virtio. To optimize virtio notify, vhost-vfio trys to
mmap the corresponding page on BAR1 for each queue and leverage EPT to let
guest virtio driver kick vDPA device doorbell directly. For virtio 0.95
case in which we cannot set host notifier memory region, QEMU will help to
relay the notify to vDPA device.

Note: EPT mapping requires each queue's notify address locates at the
beginning of a separate page, parameter "page-per-vq=on" could help.


I think qemu should prepare a fallback for this if page-per-vq is off.



For interrupt setting, vDPA mdev device leverages existing VFIO API to
enable interrupt config in user space. In this way, KVM's irqfd for virtio
can be set to mdev device by QEMU using ioctl().

vhost-vfio net client will set up a vDPA mdev device which is specified
by a "sysfsdev" parameter, during the net client init, the device will be
opened and parsed using VFIO API, the VFIO device fd and device BAR region
offset will be kept in a VhostVFIO structure, this initialization provides
a channel to configure vhost information to the vDPA device driver.

To do later
===========
1. The net client initialization uses raw VFIO API to open vDPA mdev
device, it's better to provide a set of helpers in hw/vfio/common.c
to help vhost-vfio initialize device easily.

2. For device DMA mapping, QEMU passes memory region info to mdev device
and let kernel parent device driver program IOMMU. This is a temporary
implementation, for future when IOMMU driver supports mdev bus, we
can use VFIO API to program IOMMU directly for parent device.
Refer to the patch (vfio/mdev: IOMMU aware mediated device):
https://lkml.org/lkml/2018/10/12/225


As Steve mentioned in the KVM forum. It's better to have at least one sample driver e.g virtio-net itself.

Then it would be more convenient for the reviewer to evaluate the whole stack.

Thanks



Vhost-vfio usage
================
# Query the number of available mdev instances
$ cat 
/sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/available_instances

# Create a mdev instance
$ echo $UUID > 
/sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/create

# Launch QEMU with a virtio-net device
     qemu-system-x86_64 -cpu host -enable-kvm \
     <snip>
     -mem-prealloc \
     -netdev type=vhost-vfio,sysfsdev=/sys/bus/mdev/devices/$UUID,id=mynet\
     -device virtio-net-pci,netdv=mynet,page-per-vq=on \

-------- END --------

Xiao Wang (2):
   vhost-vfio: introduce vhost-vfio net client
   vhost-vfio: implement vhost-vfio backend

  hw/net/vhost_net.c                |  56 ++++-
  hw/vfio/common.c                  |   3 +-
  hw/virtio/Makefile.objs           |   2 +-
  hw/virtio/vhost-backend.c         |   3 +
  hw/virtio/vhost-vfio.c            | 501 ++++++++++++++++++++++++++++++++++++++
  hw/virtio/vhost.c                 |  15 ++
  include/hw/virtio/vhost-backend.h |   7 +-
  include/hw/virtio/vhost-vfio.h    |  35 +++
  include/hw/virtio/vhost.h         |   2 +
  include/net/vhost-vfio.h          |  17 ++
  linux-headers/linux/vhost.h       |   9 +
  net/Makefile.objs                 |   1 +
  net/clients.h                     |   3 +
  net/net.c                         |   1 +
  net/vhost-vfio.c                  | 327 +++++++++++++++++++++++++
  qapi/net.json                     |  22 +-
  16 files changed, 996 insertions(+), 8 deletions(-)
  create mode 100644 hw/virtio/vhost-vfio.c
  create mode 100644 include/hw/virtio/vhost-vfio.h
  create mode 100644 include/net/vhost-vfio.h
  create mode 100644 net/vhost-vfio.c




reply via email to

[Prev in Thread] Current Thread [Next in Thread]