qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC virtio-dev] vhost-user-slave: add vhost-user slave dev


From: Stefan Hajnoczi
Subject: [Qemu-devel] [RFC virtio-dev] vhost-user-slave: add vhost-user slave device type
Date: Fri, 15 Dec 2017 17:05:19 +0000

The vhost-user slave device facilitates vhost-user device emulation
through vhost-user protocol exchanges and access to shared memory.
Software-defined networking, storage, and other I/O appliances can
provide services through this device.

This device is based on Wei Wang's vhost-pci work.  The vhost-user slave
device differs from vhost-pci because it is a single virtio device type
that exposes the vhost-user protocol instead of a family of new virtio
device types, one for each vhost-user device type.

This device supports vhost-user slave and vhost-user master
reconnection.  It also contains a UUID so that vhost-user slave programs
can identify a specific device among many without using bus addresses.

It is somewhat unconventional for a virtio device because it makes use
of additional resources called doorbells, notifications, and shared
memory.  A mapping of these resources to the virtio PCI transport is
provided.  Other transports, such as CCW may not be able to support
this device.

Cc: Wei Wang <address@hidden>
Cc: Michael S. Tsirkin <address@hidden>
Cc: Maxime Coquelin <address@hidden>
Signed-off-by: Stefan Hajnoczi <address@hidden>
---
 content.tex      | 292 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 introduction.tex |   1 +
 2 files changed, 293 insertions(+)

diff --git a/content.tex b/content.tex
index c840588..96778bc 100644
--- a/content.tex
+++ b/content.tex
@@ -3022,6 +3022,8 @@ Device ID  &  Virtio Device    \\
 \hline
 22         &   pstore device \\
 \hline
+24         &   vhost-user slave device \\
+\hline
 \end{tabular}
 
 Some of the devices above are unspecified by this document,
@@ -5819,6 +5821,296 @@ descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\section{Vhost-user Slave Device}\label{sec:Device Types / Vhost-user Slave 
Device}
+
+The vhost-user slave device facilitates vhost-user device emulation through
+vhost-user protocol exchanges and access to shared memory.  Software-defined
+networking, storage, and other I/O appliances can provide services through this
+device.
+
+This section relies on definitions from the \hyperref[intro:Vhost-user
+Protocol]{Vhost-user Protocol}.  Knowledge of the vhost-user protocol is a
+prerequisite for understanding this device.
+
+The \hyperref[intro:Vhost-user Protocol]{Vhost-user Protocol} was originally
+designed for processes on a single system communicating over UNIX domain
+sockets.  The vhost-user slave device allows the vhost-user slave to
+communicate with the vhost-user master over the device instead of a UNIX domain
+socket.  This allows the slave and master to run on two separate systems such
+as a virtual machine and a hypervisor.
+
+The vhost-user slave program exchanges vhost-user protocol messages with the
+vhost-user master through this device.  How the device implementation
+communicates with the vhost-user master is beyond the scope of this
+specification.  One possible device implementation uses a UNIX domain socket to
+relay messages to a vhost-user master process.
+
+Existing vhost-user slave programs that communicate over UNIX domain sockets
+can support the vhost-user slave device interface without invasive changes
+because the same vhost-user wire protocol is used.
+
+\subsection{Device ID}\label{sec:Device Types / Vhost-user Slave Device / 
Device ID}
+  24
+
+\subsection{Virtqueues}\label{sec:Device Types / Vhost-user Slave Device / 
Virtqueues}
+
+\begin{description}
+\item[0] m2srxq (requests from vhost-user master)
+\item[1] m2stxq (responses to vhost-user master)
+\item[2] s2mrxq (responses from vhost-user master)
+\item[3] s2mtxq (requests to vhost-user master)
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / Vhost-user Slave Device / 
Feature bits}
+
+No feature bits are defined at this time.
+
+\subsection{Device configuration layout}\label{sec:Device Types / Vhost-user 
Slave Device / Device configuration layout}
+
+  All fields of this configuration are always available.
+
+\begin{lstlisting}
+struct virtio_vhostslave_config {
+        le32 status;
+#define VIRTIO_VHOSTSLAVE_STATUS_SLAVE_UP 0
+#define VIRTIO_VHOSTSLAVE_STATUS_MASTER_UP 1
+        le32 max_vhost_queues;
+        u8 uuid[16];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{status}] contains the vhost-user operational status.  The default
+    value of this field is 0.
+
+    The driver sets VIRTIO_VHOSTSLAVE_STATUS_SLAVE_UP to indicate readiness for
+    the vhost-user master to connect.  The vhost-user master cannot connect
+    unless the driver has set this bit first.
+
+    When the driver clears VIRTIO_VHOSTSLAVE_SLAVE_UP while the vhost-user
+    master is connected, the vhost-user master is disconnected.
+
+    When the vhost-user master disconnects, both
+    VIRTIO_VHOSTSLAVE_STATUS_SLAVE_UP and VIRTIO_VHOSTSLAVE_STATUS_MASTER_UP
+    are cleared by the device.  Communication can be restarted by the driver
+    setting VIRTIO_VHOSTSLAVE_STATUS_SLAVE_UP again.
+
+    A configuration change notification is sent when the device changes
+    this field idendependently of a driver write.
+
+\item[\field{max_vhost_queues}] is the maximum number of vhost-user queues
+    supported by this device.  This field is always greater than 0.
+
+\item[\field{uuid}] is the Universally Unique Identifier (UUID) for this
+    device.  If the device has no UUID then this field contains the nil
+    UUID (all zeroes).  The UUID allows vhost-user slave programs to identify a
+    specific vhost-user slave device among many without relying on bus
+    addresses.
+\end{description}
+
+\drivernormative{\subsubsection}{Device configuration layout}{Device Types / 
Vhost-user Slave Device / Device configuration layout}
+
+The driver MUST NOT write to device configuration fields other than
+\field{status}.
+
+The driver MUST NOT set undefined bits in the \field{status} configuration 
field.
+
+\devicenormative{\subsection}{Device Initialization}{Device Types / Vhost-user 
Slave Device / Device Initialization}
+
+The driver SHOULD check the \field{max_vhost_queues} configuration field to
+determine how many queues the vhost-user slave will be able to support.
+
+The driver SHOULD fetch the \field{uuid} configuration field to allow
+vhost-user slave programs to identify a specific device among many.
+
+The driver SHOULD initialize the s2mrxq and s2mtxq virtqueues.  These
+virtqueues used if the VHOST_USER_PROTOCOL_F_SLAVE_REQ vhost-user protocol
+feature is negotiated.
+
+The driver SHOULD place at least one buffer in m2srxq before setting the
+VIRTIO_VHOSTSLAVE_SLAVE_UP bit in the \field{status} configuration field.
+
+The driver MUST handle m2srxq virtqueue notifications that occur before the
+configuration change notification.  It is possible that a vhost-user protocol
+message from the vhost-user master arrives before the driver has seen the
+configuration change notification for the VIRTIO_VHOSTSLAVE_STATUS_MASTER_UP
+\field{status} change.
+
+\subsection{Device Operation}\label{sec:Device Types / Vhost-user Slave Device 
/ Device Operation}
+
+Device operation consists of operating request queues and response queues.
+
+\subsubsection{Device Operation: Request Queues}\label{sec:Device Types / 
Vhost-user Slave Device / Device Operation / Device Operation: Request Queues}
+
+The driver receives vhost-user protocol messages from the vhost-user master on
+m2srxq.  The driver sends responses to the vhost-user master on m2stxq.
+
+The driver sends slave-initiated requests on s2mtxq.  The driver receives
+responses from the vhost-user master on s2mrxq.
+
+All virtqueues offer in-order guaranteed delivery semantics for vhost-user
+protocol messages.
+
+Each buffer is a vhost-user protocol message as defined by the
+\hyperref[intro:Vhost-user Protocol]{Vhost-user Protocol}.  File descriptor
+passing is handled differently by the vhost-user slave device.  When a message
+is received that carries one or more file descriptors according to the
+vhost-user protocol, additional device resources become available to the
+driver.
+
+\subsection{Additional Device Resources over PCI}\label{sec:Device Types / 
Vhost-user Slave Device / Additional Device Resources over PCI}
+
+The vhost-user slave device contains additional device resources beyond
+configuration space and virtqueues.  The nature of these resources is
+transport-specific and therefore only virtio transports that provide these
+resources support the vhost-user slave device.
+
+The following additional resources exist:
+\begin{description}
+  \item[Doorbells] The driver signals the vhost-user master through doorbells. 
 The signal does not carry any data, it is purely an event.
+  \item[Notifications] The vhost-user master signals the driver for events 
besides virtqueue activity and configuration changes by sending notifications.
+  \item[Shared memory] The vhost-user master gives access to memory that can 
be mapped by the driver.
+\end{description}
+
+\subsubsection{Doorbell Numbering}\label{sec:Device Types / Vhost-user Slave 
Device / Additional Device Resources over PCI / Doorbell Numbering}
+
+Doorbells are laid out as follows:
+
+\begin{description}
+\item[0] Vring call for vhost-user queue 0
+\item[\ldots]
+\item[N] Vring err for vhost-user queue 0
+\item[\ldots]
+\item[2N] Log
+\end{description}
+
+\subsubsection{Notifications}\label{sec:Device Types / Vhost-user Slave Device 
/ Additional Device Resources over PCI / Notifications}
+
+Notifications are laid out as follows:
+
+\begin{description}
+\item[0] Vring kick for vhost-user queue 0
+\item[\ldots]
+\item[N-1] Vring kick for vhost-user queue N-1
+\end{description}
+
+\subsubsection{Shared Memory Layout}\label{sec:Device Types / Vhost-user Slave 
Device / Additional Device Resources over PCI / Shared Memory Layout}
+
+Shared memory is laid out as follows:
+
+\begin{description}
+\item[0] Vhost memory region 0
+\item[SIZE0] Vhost memory region 1
+\item[\ldots]
+\item[SIZE0 + SIZE1 + \ldots] Log
+\end{description}
+
+The size of vhost memory region 0 is \field{SIZE0}, the size of vhost memory
+region 1 is \field{SIZE1}, and so on.
+
+\subsubsection{Availability of Additional Resources}\label{sec:Device Types / 
Vhost-user Slave Device / Additional Device Resources over PCI / Availability 
of Additional Resources}
+
+The following vhost-user protocol messages convey access to additional device
+resources:
+
+\begin{description}
+\item[VHOST_USER_SET_MEM_TABLE] Contents of vhost memory regions are available 
to the driver in shared memory.  Region contents are laid out in the same order 
as the vhost memory region list.
+\item[VHOST_USER_SET_LOG_BASE] Contents of the log are available to the driver 
in shared memory.
+\item[VHOST_USER_SET_LOG_FD] The log doorbell is available to the driver.  
Writes to the log doorbell before this message is received produce no effect.
+\item[VHOST_USER_SET_VRING_KICK] The vring kick notification for this queue is 
available to the driver.  The first notification may occur before the driver 
has processed this message.
+\item[VHOST_USER_SET_VRING_CALL] The vring call doorbell for this queue is 
available to the driver.  Writes to the vring call doorbell before this message 
is received produce no effect.
+\item[VHOST_USER_SET_VRING_ERR] The vring err doorbell for this queue is 
available to the driver.  Writes to the vring err doorbell before this message 
is received produce no effect.
+\item[VHOST_USER_SET_SLAVE_REQ_FD] The driver may send vhost-user protocol 
slave messages on s2mtxq.  Buffers put onto s2mtxq before this message is 
received are discarded by the device.
+\end{description}
+
+Additional resources are configured on the virtio PCI transport by the 
following \field{struct virtio_pci_cap.cfg_type} values:
+
+\begin{lstlisting}
+#define VIRTIO_PCI_CAP_DOORBELL_CFG 6
+#define VIRTIO_PCI_CAP_NOTIFICATION_CFG 7
+#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
+\end{lstlisting}
+
+\subsubsection{Doorbell structure layout}\label{sec:Device Types / Vhost-user 
Slave Device / Additional Device Resources over PCI / Doorbell capability}
+
+The doorbell location is found using the VIRTIO_PCI_CAP_DOORBELL_CFG
+capability.  This capability is immediately followed by an additional
+field, like so:
+
+\begin{lstlisting}
+struct virtio_pci_doorbell_cap {
+        struct virtio_pci_cap cap;
+        le32 doorbell_off_multiplier;
+};
+\end{lstlisting}
+
+The doorbell address within a BAR is calculated as follows:
+
+\begin{lstlisting}
+        cap.offset + doorbell_idx * doorbell_off_multiplier
+\end{lstlisting}
+
+The \field{cap.offset} and \field{doorbell_off_multiplier} are taken from the
+notification capability structure above, and the \field{doorbell_idx} is the
+doorbell number.
+
+\devicenormative{\paragraph}{Doorbell capability}{Device Types / Vhost-user 
Slave Device / Additional Device Resources over PCI / Doorbell capability}
+The device MUST present at least one doorbell capability.
+
+The \field{cap.offset} MUST be 2-byte aligned.  
+
+The device MUST either present \field{doorbell_off_multiplier} as an even 
power of 2,
+or present \field{doorbell_off_multiplier} as 0.
+
+The value \field{cap.length} presented by the device MUST be at least 2
+and MUST be large enough to support doorbell offsets for all supported
+doorbells in all possible configurations.
+
+The value \field{cap.length} presented by the device MUST satisfy:
+\begin{lstlisting}
+cap.length >= num_doorbells * doorbell_off_multiplier + 2
+\end{lstlisting}
+
+The number of doorbells is \field{num_doorbells} and is dependent on the
+device.
+
+\subsubsection{Notification structure layout}\label{sec:Device Types / 
Vhost-user Slave Device / Additional Device Resources over PCI / Notification 
capability}
+
+The notification structure allows MSI-X vectors to be configured for
+notification interrupts.  If MSI-X is not available, bit 2 of the ISR status
+indicates that a notification occurred.
+
+The notification structure is found using the VIRTIO_PCI_CAP_DOORBELL_CFG
+capability.
+
+\begin{lstlisting}
+struct virtio_pci_notification_cfg {
+        le16 notification_select;              /* read-write */
+        le16 notification_msix_vector;         /* read-write */
+};
+\end{lstlisting}
+
+The driver indicates which notification is of interest by writing the
+\field{notification_select} field.  The driver then writes the MSI-X vector or
+\field{VIRTIO_MSI_NO_VECTOR} to \field{notification_msix_vector} to change the
+MSI-X vector for that notification.
+
+\subsubsection{Shared memory capability}\label{sec:Device Types / Vhost-user 
Slave Device / Additional Device Resources over PCI / Shared Memory capability}
+
+The shared memory location is found using the VIRTIO_PCI_CAP_SHARED_MEMORY_CFG
+capability.
+
+\devicenormative{\paragraph}{Shared Memory capability}{Device Types / 
Vhost-user Slave Device / Additional Device Resources over PCI / Shared Memory 
capability}
+The device MUST present exactly one shared memory capability.
+
+The device MUST locate shared memory in a Memory Space BAR.
+
+The device SHOULD locate shared memory in a Prefetchable BAR.
+
+The \field{cap.offset} MUST be 4096-byte aligned.
+
+The value \field{cap.length} presented by the device MUST be non-zero and 
4096-byte aligned.
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently these device-independent feature bits defined:
diff --git a/introduction.tex b/introduction.tex
index 979881e..0bf400d 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -60,6 +60,7 @@ Levels'', BCP 14, RFC 2119, March 1997. 
\newline\url{http://www.ietf.org/rfc/rfc
        \phantomsection\label{intro:SCSI MMC}\textbf{[SCSI MMC]} &
         SCSI Multimedia Commands,
         \newline\url{http://www.t10.org/cgi-bin/ac.pl?t=f&f=mmc6r00.pdf}\\
+       \phantomsection\label{intro:Vhost-user Protocol}\textbf{[Vhost-user 
Protocol]} & Vhost-user Protocol, 
\newline\url{https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/interop/vhost-user.txt;hb=HEAD},
 and any future revisions\\
 
 \end{longtable}
 
-- 
2.14.3




reply via email to

[Prev in Thread] Current Thread [Next in Thread]