[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH] vhost-pci-net: add a new virtio device, vhost-pci-n
From: |
Wei Wang |
Subject: |
[Qemu-devel] [PATCH] vhost-pci-net: add a new virtio device, vhost-pci-net, for network packet transmission between VMs |
Date: |
Fri, 14 Oct 2016 20:28:06 +0800 |
In addition to the data path established using vhost-pci-net, this
patch also adds a support of establishing a notification path between
two virtio devices. New registers are added to the virtio device to
record all that's needed for its driver to inject interrupts using
hypercalls to the peer device (here, we treat virtio<---->virtio
connection as peer<---->peer) on the other end.
Signed-off-by: Wei Wang <address@hidden>
---
content.tex | 227 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 222 insertions(+), 5 deletions(-)
diff --git a/content.tex b/content.tex
index 4b45678..5f9bdae 100644
--- a/content.tex
+++ b/content.tex
@@ -1295,6 +1295,14 @@ struct virtio_pci_common_cfg {
le64 queue_desc; /* read-write */
le64 queue_avail; /* read-write */
le64 queue_used; /* read-write */
+
+ /* About a peer device */
+ le16 peer_connection; /* read-write */
+ le16 peer_num_rx_queues; /* read only for driver */
+ le16 peer_rx_queue_select; /* read-write */
+ le32 peer_rx_queue_gsi; /* read-only for driver */
+ le64 peer_uuid_hi; /* read-only for driver */
+ le64 peer_uuid_lo; /* read-only for driver */
};
\end{lstlisting}
@@ -1361,6 +1369,25 @@ struct virtio_pci_common_cfg {
\item[\field{queue_used}]
The driver writes the physical address of Used Ring here. See section
\ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+
+\item[\field{peer_connection}]
+ Connection Control/Status. 1 - Connected; 0 - Disconnected.
+
+\item[\field{peer_num_rx_queues}]
+ The device uses this to report the number of RX virtqueues that the
connected peer device uses.
+
+\item[\field{peer_rx_queue_select}]
+ The driver selects which RX virtqueue of the peer device the following
fields refer to.
+
+\item[\field{peer_rx_queue_gsi}]
+ The device writes the GSI of an RX virtqueue of the peer device here.
+
+\item[\field{peer_uuid_hi}]
+ The device writes the high order 64-bit of the peer uuid here.
+
+\item[\field{peer_uuid_lo}]
+ The device writes the low order 64-bit of the peer uuid here.
+
\end{description}
\devicenormative{\paragraph}{Common configuration structure layout}{Virtio
Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common
configuration structure layout}
@@ -1405,9 +1432,15 @@ The device MUST present a 0 in \field{queue_enable} on
reset.
The device MUST present a 0 in \field{queue_size} if the virtqueue
corresponding to the current \field{queue_select} is unavailable.
+The peer device related registers are used when the device is connected to
another device (e.g. a vhost-pci device instance). The device SHOULD negotiate
with the peer device, and configure \field{peer_num_rx_queues},
\field{peer_rx_queue_gsi}, \field{peer_uuid_hi}, and \field{peer_uuid_lo}.
+
+When the device finishes the necessary negotiation with the peer device to
establish the connection, it MUST write a 1 to the \field{peer_connection} and
notify the driver.
+
+When the device notifies that the driver requests to write a 0 to
\field{peer_connection}, it SHOULD first negotiate with the peer device to
close the connection, and then write a 0 to the \field{peer_connection} and
notify the driver.
+
\drivernormative{\paragraph}{Common configuration structure layout}{Virtio
Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common
configuration structure layout}
-The driver MUST NOT write to \field{device_feature}, \field{num_queues},
\field{config_generation} or \field{queue_notify_off}.
+The driver MUST NOT write to \field{device_feature}, \field{num_queues},
\field{config_generation}, \field{queue_notify_off},
\field{peer_num_rx_queues}, \field{peer_rx_queue_gsi}, \field{peer_uuid_hi}, or
\field{peer_uuid_lo}.
The driver MUST NOT write a value which is not a power of 2 to
\field{queue_size}.
@@ -1419,6 +1452,12 @@ After writing 0 to \field{device_status}, the driver
MUST wait for a read of
The driver MUST NOT write a 0 to \field{queue_enable}.
+The driver MUST NOT write a 1 to \field{peer_connection}.
+
+The driver SHOULD NOT read the peer device related registers until it is
notified that a 1 has been written to \field{peer_connection}.
+
+The driver MUST NOT unload until it reads a 0 from \field{peer_connection}.
+
\subsubsection{Notification structure layout}\label{sec:Virtio Transport
Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
@@ -1476,11 +1515,11 @@ The \field{offset} for the \field{ISR status} has no
alignment requirements.
The ISR bits allow the device to distinguish between device-specific
configuration
change interrupts and normal virtqueue interrupts:
-\begin{tabular}{ |l||l|l|l| }
+\begin{tabular}{ |l||p{3.5cm}|p{3.5cm}|p{3.5cm}|l| }
\hline
-Bits & 0 & 1 & 2 to 31 \\
+Bits & 0 & 1 & 2
& 3 to 31 \\
\hline
-Purpose & Queue Interrupt & Device Configuration Interrupt & Reserved \\
+Purpose & Queue Interrupt & Device Configuration Interrupt & Peer Device
Status Interrupt & Reserved \\
\hline
\end{tabular}
@@ -5750,9 +5789,181 @@ descriptor for the \field{sense_len}, \field{residual},
\field{status_qualifier}, \field{status}, \field{response} and
\field{sense} fields.
+\section{Vhost-pci Net Device}\label{sec:Device Types / Vhost-pci Net Device}
+
+The vhost-pci net device enables point-to-point transmission of network
packets between two isolated address spaces (e.g. virtual machines). An
instance of the vhost-pci net device transmits and grabs packets from its peer
device, which is usually a virtio net device from another address space.
+
+\subsection{Device ID}\label{sec:Device Types / Vhost-pci Net Device / Device
ID}
+ TBD
+
+\subsection{Virtqueues}\label{sec:Device Types / Vhost-pci Net Device /
Virtqueues}
+
+\begin{description}
+\item[0] control receiveq
+\item[1] control transmitq
+\item[2] receiveq
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / Vhost-pci Net Device /
Feature bits}
+
+\subsubsection{Device feature bits}\label{Device Types / Vhost-pci Net Device
/ Feature bits / Device feature bits}
+
+The device feature bits are the traditional feature bits, which are negotiated
between the device and its driver.
+
+\begin{description}
+\item[VHOST_PCI_NET_F_MAC (0)] Device has given MAC address.
+
+\item[VHOST_PCI_NET_F_CTRL_MAC_ADDR (1)] Set MAC address through control
channel.
+
+\item[VHOST_PCI_NET_F_MRG_RXBUF (2)] Driver can merge receive buffers.
+\end{description}
+
+\subsubsection{Peer feature bits}\label{Device Types / Vhost-pci Net Device /
Feature bits / Peer feature bits}
+The peer feature bits need to be negotiated with the peer device. The feature
bits that have been negotiated with the peer device are sent to the driver for
a negotiation. If the driver only accepts a subset of the feature bits, the
device needs to re-negotiate the subset of feature bits with the peer device,
which may trigger a reset of the peer device.
+
+\begin{description}
+\item[VIRTIO_NET_F_GUEST_TSO4 (7)] Virtio-net can receive TSOv4.
+
+\item[VIRTIO_NET_F_GUEST_TSO6 (8)] Virtio-net can receive TSOv6.
+
+\item[VIRTIO_NET_F_GUEST_ECN (9)] Virtio-net can receive TSO with ECN.
+
+\item[VIRTIO_NET_F_GUEST_UFO (10)] Virtio-net can receive UFO.
+
+\item[VIRTIO_NET_F_HOST_TSO4 (11)] Vhost-pci-net supports TSOv4.
+
+\item[VIRTIO_NET_F_HOST_TSO6 (12)] Vhost-pci-net supports TSOv6.
+
+\item[VIRTIO_NET_F_HOST_ECN (13)] Vhost-pci-net supports TSO with ECN.
+
+\item[VIRTIO_NET_F_HOST_UFO (14)] Vhost-pci-net supports UFO.
+
+\item[VIRTIO_NET_F_MRG_RXBUF (15)] Virtio-net can merge receive buffers.
+
+\item[VHOST_F_LOG_ALL (27)] Vhost-pci-net supports dirty page logging.
+
+\end{description}
+
+\devicenormative{\paragraph}{Peer feature bits}{Device Types / Vhost-pci Net
Device / Feature bits / Peer feature bits}
+The device SHOULD send the feature bits that have been accepted by the peer
device to the driver through the control receiveq.
+
+\drivernormative{\paragraph}{Peer feature bits}{Device Types / Vhost-pci Net
Device / Feature bits / Peer feature bits }
+Upon receiving the peer feature bits from the device, the driver SHOULD send
its supported peer feature bits to the device via the control transmitq.
+
+\subsection{Device configuration layout}\label{sec:Device Types / Vhost-pci
Device / Device configuration layout}
+ None currently defined.
+
+\subsection{Device Initialization}\label{sec:Device Types / Vhost-pci Device /
Device Initialization}
+
+The driver would perform a typical initialization routine like so:
+
+\begin{enumerate}
+\item Identify and intialize the control receiveq, control transmitq, and
receiveq.
+
+\item Fill the receiveq and control receiveq with buffers.
+
+\item If the VHOST_PCI_NET_F_MAC feature bit is set, the configuration
+ space \field{mac} entry indicates the ``physical'' address of the
+ network card, otherwise the driver would typically generate a random
+ local MAC address.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Vhost-pci Net Device /
Device Operation}
+
+\subsubsection{Control Virtqueue}\label{sec:Device Types / Vhost-pci Net
Device / Device Operation / Control Virtqueue}
+
+The pair of control virtqueues are used to exchange configuration messages
between the device and driver. All the configuration messages are constructed
using the folloing structure:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl {
+ u32 request;
+ u64 vhost_pci_id;
+ u8 request_specific_payload[];
+};
+\end{lstlisting}
+
+The \field{vhost_pci_id} stores the id of the vhost pci device. It is usually
assigned by the vhost-pci device management software.
+The requests are defined following the VHOST_PCI_CTRL format, and they are
introduced below.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_FEATURE_BITS 0
+\end{lstlisting}
+
+The device sends the peer feature bits that have been negotiated with the peer
device to the driver via the control receiveq. The driver sends back its
accepted peer feature bits to the device via the control transmitq.
+The request payload is described using the following structure:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_driver_feature_bits {
+ u64 feature_bits;
+}
+\end{lstlisting}
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_MEM_INFO 1
+\end{lstlisting}
+
+The device sends the memory info obtained from the peer device to the driver.
The payload is described using the structure below:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_mem_info {
+#define VHOST_PCI_MEM_INFO_NEED_MAP_N 0
+#define VHOST_PCI_MEM_INFO_NEED_MAP_Y 1
+ u8 need_map;
+ u64 peer_mem;
+ u8 other_mem_info[];
+}
+\end{lstlisting}
+
+If \field{need_map} is set to VHOST_PCI_MEM_INFO_NEED_MAP_N, \field{peer_mem}
stores the virtual address which already maps to the start of the peer memory.
The driver can use it directly to access the peer memory.
+
+If \field{need_map} is set to VHOST_PCI_MEM_INFO_NEED_MAP_Y, the driver needs
to map the peer memory via a device BAR, and \field{peer_mem} stores the BAR
id. The driver sends back a message to the device with \field{peer_mem} set to
the virtual address that maps to the peer memory.
+
+The \field{other_mem_info} stores other peer memory info for the driver to
reference, and it is defined according to the implementation's need.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_VIRTQ_INFO 2
+\end{lstlisting}
+
+The device sends the virtqueue info obtained from the peer device to the
driver. The payload is described using the structure below:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_virtq_info {
+#define VHOST_PCI_PEER_VIRTQ_TX 0
+#define VHOST_PCI_PEER_VIRTQ_RX 1
+ u8 tx_or_rx;
+ u32 virtq_num;
+ struct virtq vq[];
+}
+\end{lstlisting}
+
+If the \field{tx_or_rx} is set to VHOST_PCI_PEER_VIRTQ_TX, the driver
initializes \field{virtq_num} of virtqueues by sharing the TX virtqueues from
the peer device, and uses them as its mirrored RX virtqueues. To receive
packets from the peer device, the driver copies packets from the mirrored RX
virtqueues to its own RX virtqueue (i.e. the defined receivq).
+
+If the \field{tx_or_rx} is set to VHOST_PCI_PEER_VIRTQ_RX, the driver
initializes \field{virtq_num} of virtqueues by sharing the RX virtqueues from
the peer device, and uses them as its mirrored TX virtqueues. To transmit
packets to the peer device, the driver copies packets to the mirrored TX
virtqueues.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING 3
+\end{lstlisting}
+
+The device sends messages to turn on or off the page logging mode of the
driver.
+\begin{lstlisting}
+struct vhost_pci_ctrl_dirty_page_logging {
+#define VHOST_PCI_DIRTY_PAGE_LOGGING_OFF 0
+#define VHOST_PCI_DIRTY_PAGE_LOGGING_ON 1
+ u8 off_or_on;
+}
+\end{lstlisting}
+
+Other types of vhost-pci devices (e.g. scsi, console) may use the same
controlq messages above. Here defines the messages that are specific to
vhost-pci net devices.
+
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_MAC 0x10000
+\end{lstlisting}
+
+if \field{VHOST_PCI_NET_F_CTRL_MAC_ADDR} is negotiated, the driver sends a
message via the control transmitq to set the MAC address of the device.
+
\chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
-Currently there are three device-independent feature bits defined:
+Currently there are four device-independent feature bits defined:
\begin{description}
\item[VIRTIO_F_RING_INDIRECT_DESC (28)] Negotiating this feature indicates
@@ -5764,6 +5975,10 @@ Currently there are three device-independent feature
bits defined:
\item[VIRTIO_F_VERSION_1(32)] This indicates compliance with this
specification, giving a simple way to detect legacy devices or drivers.
+
+ \item[VIRTIO_F_PV_INTERRUPT(33)] Negotiating this feature indicates that the
+ driver can inject an interrupt to its peer device in a paravirtualized
+ way (e.g. hypercall).
\end{description}
\drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -5776,6 +5991,8 @@ MAY fail to operate further if VIRTIO_F_VERSION_1 is not
offered.
A device MUST offer VIRTIO_F_VERSION_1. A device MAY fail to operate further
if VIRTIO_F_VERSION_1 is not accepted.
+A device MUST check if the management environment (e.g. a virtual machine
monitor) supports pv interrupt and configures the VIRTIO_F_PV_INTERRUPT feature
bit accordingly.
+
\section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature
Bits / Legacy Interface: Reserved Feature Bits}
Transitional devices MAY offer the following:
--
1.9.1
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-devel] [PATCH] vhost-pci-net: add a new virtio device, vhost-pci-net, for network packet transmission between VMs,
Wei Wang <=