qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] netmap backend (revised)


From: Luigi Rizzo
Subject: Re: [Qemu-devel] [PATCH v2] netmap backend (revised)
Date: Wed, 23 Jan 2013 12:50:26 +0100
User-agent: Mutt/1.4.2.3i

On Wed, Jan 23, 2013 at 12:10:55PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jan 22, 2013 at 08:12:15AM +0100, Luigi Rizzo wrote:
> > reposting a version without changes that implement bounded
> > queues in net/queue.c
> > 
> > Hi,
> > the attached patch implements a qemu backend for the "netmap" API
> > thus allowing machines to attach to the VALE software switch as
> > well as netmap-supported cards (links below).
> > 
> > http://info.iet.unipi.it/~luigi/netmap/
> > http://info.iet.unipi.it/~luigi/vale/
> > 
> > This is a cleaned up version of code written last summer.
> 
> Looks like a clean software approach to low-level packet I/O.  I guess
> the biggest competitor would be a userspace library with NIC drivers
> using modern IOMMUs to avoid the security/reliability problem that
> previous userspace approaches suffered.  Pretty cool that netmap reuses
> kernel NIC driver implementations to avoid duplicating all that code.
> 
> I wonder how/if netmaps handles advanced NIC features like multiqueue
> and offloads?  But I've only read the webpage, not the papers or source
> code :).

there are definitely more details in the papers :)

IOMMU is not strictly necessary because userspace only sees packet
buffers and a device independent replica of the descriptor ring.
NIC registers and rings are only manipulated by the kernel within
system calls.

multiqueue is completely straightforward -- netmap exposes as many
queues as the hardware implements, you can attach file descriptors to
individual queues, bind threads to queues by just using pthread_setaffinity().

offloading so far is not supported, and that's part of the design:
it adds complexity at runtime to check the various possible
combinations of offloading in various places in the stack.
A single packet format makes the driver extremely efficient.

The thing that may make a difference is tcp segmentation and reassembly,
we will look at how to support it at some point.

I'apply the other changes you suggest below, thanks.
Only some comments:

The debugging macros (D, RD() )  are taken from our existing code,
> 
> > +#define ND(fd, ... )       // debugging
> > +#define D(format, ...)                                          \
...
> > +/* rate limited, lps indicates how many per second */
> > +#define RD(lps, format, ...)                                    \
...

> Have you seen docs/tracing.txt?  It can do fprintf(stderr) but also
> SystemTap/DTrace and a simple built-in binary tracer.

will look at it. These debugging macros are the same we use in other
netmap code so i'd prefer to keep them.

> > +// a fast copy routine only for multiples of 64 bytes, non overlapped.
> > +static inline void
> > +pkt_copy(const void *_src, void *_dst, int l)
...
> > +                *dst++ = *src++;
> > +        }
> > +}
> 
> I wonder how different FreeBSD bcopy() is from glibc memcpy() and if the
> optimization is even a win.  The glibc code is probably hand-written
> assembly that CPU vendors have contributed for specific CPU model
> families.
> 
> Did you compare glibc memcpy() against pkt_copy()?

I haven't tried in detail on glibc but will run some tests.  In any
case not all systems have glibc, and on FreeBSD this pkt_copy was
a significant win for small packets (saving some 20ns each; of
course this counts only when you approach the 10 Mpps range, which
is what you get with netmap, and of course when data is in cache).

One reason pkt_copy gains something is that if it can assume there
is extra space in the buffer, it can work on large chunks avoiding the extra
jumps and instructions for the remaining 1-2-4 bytes.

> > +   ring->slot[i].len = size;
> > +   pkt_copy(buf, dst, size);
> 
> How does netmap guarantee that the buffer is large enough for this
> packet?

the netmap buffers are 2k, i'll make sure there is a check that the
size is not exceeded.

> > +    close(s->me.fd);
> 
> Missing munmap?

yes you are correct.

        cheers
        luigi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]