qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature


From: Maxime Coquelin
Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
Date: Thu, 29 Sep 2016 23:23:35 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0



On 09/29/2016 10:21 PM, Michael S. Tsirkin wrote:
On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:


On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
...

Before enabling anything by default, we should first optimize the 1 slot
case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
perf regression for 64 bytes case:
 - 2 descs per packet: 11.6Mpps
 - 1 desc per packet: 9.6Mpps

This is due to the virtio header clearing in virtqueue_enqueue_xmit().
Removing it, we get better results than with 2 descs (1.20Mpps).
Since the Virtio PMD doesn't support offloads, I wonder whether we can
just drop the memset?

What will happen? Will the header be uninitialized?
Yes..
I didn't look closely at the spec, but just looked at DPDK's and Linux
vhost implementations. IIUC, the header is just skipped in the two
implementations.

In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb
first thing it does is
        memset(hdr, 0, sizeof(*hdr));

I meant in vhost-net linux implementation, the header is just skipped:

static void handle_tx(struct vhost_net *net)
{
...
                /* Skip header. TODO: support TSO. */
                len = iov_length(vq->iov, out);
                iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
                iov_iter_advance(&msg.msg_iter, hdr_size);

And the same is done is done in DPDK:

static inline int __attribute__((always_inline))
copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
                  uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
                  struct rte_mempool *mbuf_pool)
{
...
        /*
         * A virtio driver normally uses at least 2 desc buffers
         * for Tx: the first for storing the header, and others
         * for storing the data.
         */
        if (likely((desc->len == dev->vhost_hlen) &&
                   (desc->flags & VRING_DESC_F_NEXT) != 0)) {
                desc = &descs[desc->next];
                if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
                        return -1;

                desc_addr = gpa_to_vva(dev, desc->addr);
                if (unlikely(!desc_addr))
                        return -1;

                rte_prefetch0((void *)(uintptr_t)desc_addr);

                desc_offset = 0;
                desc_avail  = desc->len;
                nr_desc    += 1;

                PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
        } else {
                desc_avail  = desc->len - dev->vhost_hlen;
                desc_offset = dev->vhost_hlen;
        }




The spec says:
        The driver can send a completely checksummed packet. In this case, flags
        will be zero, and gso_type
        will be VIRTIO_NET_HDR_GSO_NONE.

and
        The driver MUST set num_buffers to zero.
        If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set flags to
        zero and SHOULD supply a fully
        checksummed packet to the device.

and
        If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have been
        negotiated, the driver MUST
        set gso_type to VIRTIO_NET_HDR_GSO_NONE.

so doing this unconditionally would be a spec violation, but if you see
value in this, we can add a feature bit.
Right it would be a spec violation, so it should be done conditionally.
If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
If negotiated, we wouldn't need to prepend a header.

Yes but two points.

1. why is this memset expensive? Is the test completely skipping looking
   at the packet otherwise?
Yes.

2. As long as we are doing this, see
        Alignment vs. Networking
        ========================
in Documentation/unaligned-memory-access.txt
Thanks, I'll have a look tomorrow.

Maxime




reply via email to

[Prev in Thread] Current Thread [Next in Thread]