qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhos


From: Jason Wang
Subject: Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhost-pci for inter-VM communication
Date: Mon, 22 May 2017 10:22:31 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0



On 2017年05月20日 00:49, Michael S. Tsirkin wrote:
On Fri, May 19, 2017 at 11:10:33AM +0800, Jason Wang wrote:

On 2017年05月18日 11:03, Wei Wang wrote:
On 05/17/2017 02:22 PM, Jason Wang wrote:

On 2017年05月17日 14:16, Jason Wang wrote:

On 2017年05月16日 15:12, Wei Wang wrote:
Hi:

Care to post the driver codes too?

OK. It may take some time to clean up the driver code before
post it out. You can first
have a check of the draft at the repo here:
https://github.com/wei-w-wang/vhost-pci-driver

Best,
Wei
Interesting, looks like there's one copy on tx side. We used to
have zerocopy support for tun for VM2VM traffic. Could you
please try to compare it with your vhost-pci-net by:

We can analyze from the whole data path - from VM1's network stack to
send packets -> VM2's
network stack to receive packets. The number of copies are actually the
same for both.
That's why I'm asking you to compare the performance. The only reason for
vhost-pci is performance. You should prove it.

vhost-pci: 1-copy happen in VM1's driver xmit(), which copes packets
from its network stack to VM2's
RX ring buffer. (we call it "zerocopy" because there is no intermediate
copy between VMs)
zerocopy enabled vhost-net: 1-copy happen in tun's recvmsg, which copies
packets from VM1's TX ring
buffer to VM2's RX ring buffer.
Actually, there's a major difference here. You do copy in guest which
consumes time slice of vcpu thread on host. Vhost_net do this in its own
thread. So I feel vhost_net is even faster here, maybe I was wrong.
Yes but only if you have enough CPUs. The point of vhost-pci
is to put the switch in a VM and scale better with # of VMs.

Does the overall performance really increase? I suspect the only thing vhost-pci gains here is probably scheduling cost and copying in guest should be slower than doing it in host.


That being said, we compared to vhost-user, instead of vhost_net,
because vhost-user is the one
that is used in NFV, which we think is a major use case for vhost-pci.
If this is true, why not draft a pmd driver instead of a kernel one? And do
you use virtio-net kernel driver to compare the performance? If yes, has OVS
dpdk optimized for kernel driver (I think not)?

What's more important, if vhost-pci is faster, I think its kernel driver
should be also faster than virtio-net, no?
If you have a vhost CPU per VCPU and can give a host CPU to each using
that will be faster.  But not everyone has so many host CPUs.

If the major use case is NFV, we should have sufficient CPU resources I believe?

Thanks




- make sure zerocopy is enabled for vhost_net
- comment skb_orphan_frags() in tun_net_xmit()

Thanks

You can even enable tx batching for tun by ethtool -C tap0 rx-frames
N. This will greatly improve the performance according to my test.

Thanks, but would this hurt latency?

Best,
Wei
I don't see this in my test.

Thanks




reply via email to

[Prev in Thread] Current Thread [Next in Thread]