qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] snabbswitch integration with QEMU for userspace etherne


From: Anthony Liguori
Subject: Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O
Date: Mon, 27 May 2013 12:01:07 -0500
User-agent: Notmuch/0.15.2+77~g661dcf8 (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu)

Paolo Bonzini <address@hidden> writes:

> Il 27/05/2013 18:18, Anthony Liguori ha scritto:
>> Paolo Bonzini <address@hidden> writes:
>> 
>>> Il 27/05/2013 11:34, Stefan Hajnoczi ha scritto:
>>>> On Sun, May 26, 2013 at 11:32:49AM +0200, Luke Gorrie wrote:
>>>>> Stefan put us onto the highly promising track of vhost/virtio. We have
>>>>> implemented this between Snabb Switch and the Linux kernel, but not
>>>>> directly between Snabb Switch and QEMU guests. The "roadblock" we have hit
>>>>> is embarrasingly basic: QEMU is using user-to-kernel system calls to setup
>>>>> vhost (open /dev/net/tun and /dev/vhost-net, ioctl()s) and I haven't found
>>>>> a good way to map these towards Snabb Switch instead of the kernel.
>>>>
>>>> vhost_net is about connecting the a virtio-net speaking process to a
>>>> tun-like device.  The problem you are trying to solve is connecting a
>>>> virtio-net speaking process to Snabb Switch.
>>>>
>>>> Either you need to replace vhost or you need a tun-like device
>>>> interface.
>>>>
>>>> How does your switch talk to hardware?
>>>
>>> And also, is your switch monolithic or does it consist of different
>>> processes?
>>>
>>> If you already have processes talking to each other, the first thing
>>> that came to my mind was a new network backend, similar to net/vde.c but
>>> more featureful (so that you support the virtio headers for offloading,
>>> for example).  Then you would use "-netdev snabb,id=net0 -device
>>> e1000,netdev=net0".
>> 
>> It would be very interesting to combine this with vmsplice/splice.
>
> Was zero-copy vmsplice/splice actually ever implemented?  I thought it
> was reverted.

Not sure what context you're talking about re: zero copy...  a pipe can
store references to pages instead of having a buffer that stores data.
That certainly is there today--otherwise the interface is pointless.

When splicing from pipe to pipe, you can move those references without
copying the data.

When vmsplicing from a userspace region to a pipe, the kernel just
stores references to the pages.  vmsplicing from a pipe to userspace
OTOH will copy the data.  This is fixable at least when dealing with
GIFT'd pages.  For guest-to-guest traffic, you wouldn't be gifting the
pages I don't think.

For implementing guest-to-guest traffic, the source QEMU can vmsplice
the packet to a pipe that is shared with the vswitch.  The vswitch can
tee(3) the first N bytes to a second pipe such that it can read the
info needed for routing decisions.

Once the decision is made, if it's a local guest, it can splice() the
packet to the appropriate destination QEMU process or another vswitch
daemon (no data copy here).

Finally, the destination QEMU process can vmsplice() from the pipe which
will copy the data (this is the only copy).

If vswitch needs to route externally, then it would need to splice() to
a macvtap.

macvtap should be able to send the packet without copying the data.  Not
sure that this last work will work as expected but if it doesn't, that's
a bug that can/should be fixed.

The kernel cannot do better than the above modulo any overhead from
userspace context switching[*].  Guest-to-guest requires a copy.
Normally macvtap is undesirable because it's tightly connected to a
network adapter but that is a desirable trait in this case.

N.B., I'm not advocating making all switching decisions in
userspace. Just pointing out how it can be done efficiently.

[*] in theory the kernel could do zero copy receive but i'm not sure
it's feasible in practice.

Regards,

Anthony Liguori

>
> Paolo
>
>>> It would be slower than vhost-net, for example no zero-copy
>>> transmission.
>> 
>> With splice, I think you could at least get single copy guest-to-guest
>> networking which is about as good as can be done.
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>>>> 3. Use the kernel as a middle-man. Create a double-ended "veth"
>>>> interface and have Snabb Switch and QEMU each open a PF_PACKET
>>>> socket and accelerate it with VHOST_NET.
>>>
>>> As Michael, mentioned, this could be macvtap on the interface that you
>>> have already created in the switch and passed to vhost-net.  Then you do
>>> not have to do anything in QEMU.
>>>
>>> Paolo
>>>
>>>> If you are using the Linux network stack then it might be better to
>>>> integrate with vhost maybe as a tun-like device driver.
>>>>
>>>> Stefan
>>>>
>>>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]