[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised))
From: |
Luigi Rizzo |
Subject: |
[Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)) |
Date: |
Wed, 23 Jan 2013 17:03:48 +0100 |
User-agent: |
Mutt/1.4.2.3i |
On Wed, Jan 23, 2013 at 02:03:17PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jan 23, 2013 at 12:50:26PM +0100, Luigi Rizzo wrote:
> > On Wed, Jan 23, 2013 at 12:10:55PM +0100, Stefan Hajnoczi wrote:
> > > On Tue, Jan 22, 2013 at 08:12:15AM +0100, Luigi Rizzo wrote:
...
> > > > +// a fast copy routine only for multiples of 64 bytes, non overlapped.
> > > > +static inline void
> > > > +pkt_copy(const void *_src, void *_dst, int l)
> > ...
> > > > + *dst++ = *src++;
> > > > + }
> > > > +}
> > >
> > > I wonder how different FreeBSD bcopy() is from glibc memcpy() and if the
> > > optimization is even a win. The glibc code is probably hand-written
> > > assembly that CPU vendors have contributed for specific CPU model
> > > families.
> > >
> > > Did you compare glibc memcpy() against pkt_copy()?
> >
> > I haven't tried in detail on glibc but will run some tests. In any
> > case not all systems have glibc, and on FreeBSD this pkt_copy was
> > a significant win for small packets (saving some 20ns each; of
> > course this counts only when you approach the 10 Mpps range, which
> > is what you get with netmap, and of course when data is in cache).
> >
> > One reason pkt_copy gains something is that if it can assume there
> > is extra space in the buffer, it can work on large chunks avoiding the extra
> > jumps and instructions for the remaining 1-2-4 bytes.
>
> I'd like to drop this code or at least make it FreeBSD-specific since
> there's no guarantee that this is a good idea on any other libc.
>
> I'm even doubtful that it's always a win on FreeBSD. You have a
> threshold to fall back to bcopy() and who knows what the "best" value
> for various CPUs is.
indeed.
With the attached program (which however might be affected by the
fact that data is not used after copying) it seems that on a recent
linux (using gcc 4.6.2) the fastest is __builtin_memcpy()
./testlock -m __builtin_memcpy -l 64
(by a factor of 2 or more) whereas all the other methods have
approximately the same speed.
On FreeBSD (with clang, gcc 4.2.1, gcc 4.6.4) the pkt_copy() above
./testlock -m fastcopy -l 64
is largely better than other methods. I am a bit puzzled why
the builtin method on FreeBSD is not effective, but i will check
on some other forum...
cheers
luigi
/*
* Copyright (C) 2012 Luigi Rizzo. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* $Id: testlock.c 12015 2013-01-23 15:51:17Z luigi $
*
* Test program to study various ops and concurrency issues.
* Create multiple threads, possibly bind to cpus, and run a workload.
*
* cc -O2 -Werror -Wall testlock.c -o testlock -lpthread
* you might need -lrt
*/
#include <inttypes.h>
#include <sys/types.h>
#include <pthread.h> /* pthread_* */
#if defined(__APPLE__)
#include <libkern/OSAtomic.h>
#define atomic_add_int(p, n) OSAtomicAdd32(n, (int *)p)
#define atomic_cmpset_32(p, o, n) OSAtomicCompareAndSwap32(o, n, (int *)p)
#elif defined(linux)
int atomic_cmpset_32(volatile uint32_t *p, uint32_t old, uint32_t new)
{
int ret = *p == old;
*p = new;
return ret;
}
#if defined(HAVE_GCC_ATOMICS)
int atomic_add_int(volatile int *p, int v)
{
return __sync_fetch_and_add(p, v);
}
#else
inline
uint32_t atomic_add_int(uint32_t *p, int v)
{
__asm __volatile (
" lock xaddl %0, %1 ; "
: "+r" (v), /* 0 (result) */
"=m" (*p) /* 1 */
: "m" (*p)); /* 2 */
return (v);
}
#endif
#else /* FreeBSD */
#include <sys/param.h>
#include <machine/atomic.h>
#include <pthread_np.h> /* pthread w/ affinity */
#if __FreeBSD_version > 500000
#include <sys/cpuset.h> /* cpu_set */
#if __FreeBSD_version > 800000
#define HAVE_AFFINITY
#endif
inline void prefetch (const void *x)
{
__asm volatile("prefetcht0 %0" :: "m" (*(const unsigned long *)x));
}
#else /* FreeBSD 4.x */
int atomic_cmpset_32(volatile uint32_t *p, uint32_t old, uint32_t new)
{
int ret = *p == old;
*p = new;
return ret;
}
#define PRIu64 "llu"
#endif /* FreeBSD 4.x */
#endif /* FreeBSD */
#include <signal.h> /* signal */
#include <stdlib.h>
#include <stdio.h>
#include <poll.h>
#include <inttypes.h> /* PRI* macros */
#include <string.h> /* strcmp */
#include <fcntl.h> /* open */
#include <unistd.h> /* getopt */
#include <sys/sysctl.h> /* sysctl */
#include <sys/time.h> /* timersub */
static inline int min(int a, int b) { return a < b ? a : b; }
#define ONE_MILLION 1000000
/* debug support */
#define ND(format, ...)
#define D(format, ...) \
fprintf(stderr, "%s [%d] " format "\n", \
__FUNCTION__, __LINE__, ##__VA_ARGS__)
int verbose = 0;
#if 1//def MY_RDTSC
/* Wrapper around `rdtsc' to take reliable timestamps flushing the pipeline */
#define my_rdtsc(t) \
do { \
u_int __regs[4]; \
\
do_cpuid(0, __regs); \
(t) = rdtsc(); \
} while (0)
static __inline void
do_cpuid(u_int ax, u_int *p)
{
__asm __volatile("cpuid"
: "=a" (p[0]), "=b" (p[1]), "=c" (p[2]), "=d" (p[3])
: "0" (ax) );
}
static __inline uint64_t
rdtsc(void)
{
uint64_t rv;
// XXX does not work on linux-64 bit
__asm __volatile("rdtscp" : "=A" (rv) : : "%rax");
return (rv);
}
#endif /* 1 */
struct targ;
/*** global arguments for all threads ***/
struct glob_arg {
struct {
uint32_t ctr[1024];
} v __attribute__ ((aligned(256) ));
int64_t m_cycles; /* total cycles */
int nthreads;
int cpus;
int privs; // 1 if has IO privileges
int arg; // microseconds in usleep
char *test_name;
void (*fn)(struct targ *);
uint64_t scale; // scaling factor
char *scale_name; // scaling factor
};
/*
* Arguments for a new thread.
*/
struct targ {
struct glob_arg *g;
int completed;
u_int *glob_ctr;
uint64_t volatile count;
struct timeval tic, toc;
int me;
pthread_t thread;
int affinity;
};
static struct targ *ta;
static int global_nthreads;
/* control-C handler */
static void
sigint_h(int sig)
{
int i;
(void)sig; /* UNUSED */
for (i = 0; i < global_nthreads; i++) {
/* cancel active threads. */
if (ta[i].completed)
continue;
D("Cancelling thread #%d\n", i);
pthread_cancel(ta[i].thread);
ta[i].completed = 0;
}
signal(SIGINT, SIG_DFL);
}
/* sysctl wrapper to return the number of active CPUs */
static int
system_ncpus(void)
{
#ifdef linux
return 1;
#else
int mib[2] = { CTL_HW, HW_NCPU}, ncpus;
size_t len = sizeof(mib);
sysctl(mib, len / sizeof(mib[0]), &ncpus, &len, NULL, 0);
D("system had %d cpus", ncpus);
return (ncpus);
#endif
}
/*
* try to get I/O privileges so we can execute cli/sti etc.
*/
int
getprivs(void)
{
int fd = open("/dev/io", O_RDWR);
if (fd < 0) {
D("cannot open /dev/io, fd %d", fd);
return 0;
}
return 1;
}
/* set the thread affinity. */
/* ARGSUSED */
#ifdef HAVE_AFFINITY
static int
setaffinity(pthread_t me, int i)
{
cpuset_t cpumask;
if (i == -1)
return 0;
/* Set thread affinity affinity.*/
CPU_ZERO(&cpumask);
CPU_SET(i, &cpumask);
if (pthread_setaffinity_np(me, sizeof(cpuset_t), &cpumask) != 0) {
D("Unable to set affinity");
return 1;
}
return 0;
}
#endif
static void *
td_body(void *data)
{
struct targ *t = (struct targ *) data;
#ifdef HAVE_AFFINITY
if (0 == setaffinity(t->thread, t->affinity))
#endif
{
/* main loop.*/
D("testing %ld cycles", t->g->m_cycles);
gettimeofday(&t->tic, NULL);
t->g->fn(t);
gettimeofday(&t->toc, NULL);
}
t->completed = 1;
return (NULL);
}
void
test_sel(struct targ *t)
{
int64_t m;
for (m = 0; m < t->g->m_cycles; m++) {
fd_set r;
struct timeval to = { 0, t->g->arg};
FD_ZERO(&r);
FD_SET(0,&r);
// FD_SET(1,&r);
select(1, &r, NULL, NULL, &to);
t->count++;
}
}
void
test_poll(struct targ *t)
{
int64_t m, ms = t->g->arg/1000;
for (m = 0; m < t->g->m_cycles; m++) {
struct pollfd x;
x.fd = 0;
x.events = POLLIN;
poll(&x, 1, ms);
t->count++;
}
}
void
test_usleep(struct targ *t)
{
int64_t m;
for (m = 0; m < t->g->m_cycles; m++) {
usleep(t->g->arg);
t->count++;
}
}
void
test_cli(struct targ *t)
{
int64_t m, i;
if (!t->g->privs) {
D("%s", "privileged instructions not available");
return;
}
for (m = 0; m < t->g->m_cycles; m++) {
for (i = 0; i < ONE_MILLION; i++) {
__asm __volatile("cli;");
__asm __volatile("and %eax, %eax;");
__asm __volatile("sti;");
t->count++;
}
}
}
void
test_nop(struct targ *t)
{
int64_t m, i;
for (m = 0; m < t->g->m_cycles; m++) {
for (i = 0; i < ONE_MILLION; i++) {
__asm __volatile("nop;");
__asm __volatile("nop; nop; nop; nop; nop;");
//__asm __volatile("nop; nop; nop; nop; nop;");
t->count++;
}
}
}
void
test_rdtsc1(struct targ *t)
{
int64_t m, i;
uint64_t v;
(void)v;
for (m = 0; m < t->g->m_cycles; m++) {
for (i = 0; i < ONE_MILLION; i++) {
my_rdtsc(v);
t->count++;
}
}
}
void
test_rdtsc(struct targ *t)
{
int64_t m, i;
volatile uint64_t v;
(void)v;
for (m = 0; m < t->g->m_cycles; m++) {
for (i = 0; i < ONE_MILLION; i++) {
v = rdtsc();
t->count++;
}
}
}
void
test_add(struct targ *t)
{
int64_t m, i;
for (m = 0; m < t->g->m_cycles; m++) {
for (i = 0; i < ONE_MILLION; i++) {
t->glob_ctr[0] ++;
t->count++;
}
}
}
void
test_atomic_add(struct targ *t)
{
int64_t m, i;
for (m = 0; m < t->g->m_cycles; m++) {
for (i = 0; i < ONE_MILLION; i++) {
atomic_add_int(t->glob_ctr, 1);
t->count++;
}
}
}
void
test_atomic_cmpset(struct targ *t)
{
int64_t m, i;
for (m = 0; m < t->g->m_cycles; m++) {
for (i = 0; i < ONE_MILLION; i++) {
atomic_cmpset_32(t->glob_ctr, m, i);
t->count++;
}
}
}
void
test_time(struct targ *t)
{
int64_t m;
for (m = 0; m < t->g->m_cycles; m++) {
#ifndef __APPLE__
struct timespec ts;
clock_gettime(t->g->arg, &ts);
#endif
t->count++;
}
}
void
test_gettimeofday(struct targ *t)
{
int64_t m;
struct timeval ts;
for (m = 0; m < t->g->m_cycles; m++) {
gettimeofday(&ts, NULL);
t->count++;
}
}
/*
* getppid is the simplest system call (getpid is cached by glibc
* so it would not be a good test)
*/
void
test_getpid(struct targ *t)
{
int64_t m;
for (m = 0; m < t->g->m_cycles; m++) {
getppid();
t->count++;
}
}
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
static void
fast_bcopy(void *_src, void *_dst, int l)
{
uint64_t *src = _src;
uint64_t *dst = _dst;
if (unlikely(l >= 1024)) {
bcopy(src, dst, l);
return;
}
for (; likely(l > 0); l-=64) {
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
*dst++ = *src++;
}
}
// XXX if you want to make sure there is no inlining...
// static void (*fp)(void *_src, void *_dst, int l) = fast_bcopy;
#define HU 0x3ffff
static struct glob_arg huge[HU+1];
void
test_fastcopy(struct targ *t)
{
int64_t m;
int len = t->g->arg;
if (len > (int)sizeof(struct glob_arg))
len = sizeof(struct glob_arg);
D("fast copying %d bytes", len);
for (m = 0; m < t->g->m_cycles; m++) {
fast_bcopy(t->g, (void *)&huge[m & HU], len);
t->count+=1;
}
}
void
test_bcopy(struct targ *t)
{
int64_t m;
int len = t->g->arg;
if (len > (int)sizeof(struct glob_arg))
len = sizeof(struct glob_arg);
D("bcopying %d bytes", len);
for (m = 0; m < t->g->m_cycles; m++) {
bcopy(t->g, (void *)&huge[m & HU], len);
t->count+=1;
}
}
void
test_builtin_memcpy(struct targ *t)
{
int64_t m;
int len = t->g->arg;
if (len > (int)sizeof(struct glob_arg))
len = sizeof(struct glob_arg);
D("bcopying %d bytes", len);
for (m = 0; m < t->g->m_cycles; m++) {
__builtin_memcpy(t->g, (void *)&huge[m & HU], len);
t->count+=1;
}
}
void
test_memcpy(struct targ *t)
{
int64_t m;
int len = t->g->arg;
if (len > (int)sizeof(struct glob_arg))
len = sizeof(struct glob_arg);
D("memcopying %d bytes", len);
for (m = 0; m < t->g->m_cycles; m++) {
memcpy((void *)&huge[m & HU], t->g, len);
t->count+=1;
}
}
struct entry {
void (*fn)(struct targ *);
char *name;
uint64_t scale;
uint64_t m_cycles;
};
struct entry tests[] = {
{ test_sel, "select", 1, 1000 },
{ test_poll, "poll", 1, 1000 },
{ test_usleep, "usleep", 1, 1000 },
{ test_time, "time", 1, 1000 },
{ test_gettimeofday, "gettimeofday", 1, 1000000 },
{ test_getpid, "getpid", 1, 1000000 },
{ test_bcopy, "bcopy", 1000, 100000000 },
{ test_builtin_memcpy, "__builtin_memcpy", 1000, 100000000 },
{ test_memcpy, "memcpy", 1000, 100000000 },
{ test_fastcopy, "fastcopy", 1000, 100000000 },
{ test_add, "add", ONE_MILLION, 100000000 },
{ test_nop, "nop", ONE_MILLION, 100000000 },
{ test_atomic_add, "atomic-add", ONE_MILLION, 100000000 },
{ test_cli, "cli", ONE_MILLION, 100000000 },
{ test_rdtsc, "rdtsc", ONE_MILLION, 100000000 }, // unserialized
{ test_rdtsc1, "rdtsc1", ONE_MILLION, 100000000 }, // serialized
{ test_atomic_cmpset, "cmpset", ONE_MILLION, 100000000 },
{ NULL, NULL, 0, 0 }
};
static void
usage(void)
{
const char *cmd = "test";
int i;
fprintf(stderr,
"Usage:\n"
"%s arguments\n"
"\t-m name test name\n"
"\t-n cycles (millions) of cycles\n"
"\t-l arg bytes, usec, ... \n"
"\t-t threads total threads\n"
"\t-c cores cores to use\n"
"\t-a n force affinity every n cores\n"
"\t-A n cache contention every n bytes\n"
"\t-w report_ms milliseconds between reports\n"
"",
cmd);
fprintf(stderr, "Available tests:\n");
for (i = 0; tests[i].name; i++) {
fprintf(stderr, "%12s\n", tests[i].name);
}
exit(0);
}
static int64_t
getnum(const char *s)
{
int64_t n;
char *e;
n = strtol(s, &e, 0);
switch (e ? *e : '\0') {
case 'k':
case 'K':
return n*1000;
case 'm':
case 'M':
return n*1000*1000;
case 'g':
case 'G':
return n*1000*1000*1000;
case 't':
case 'T':
return n*1000*1000*1000*1000;
default:
return n;
}
}
struct glob_arg g;
int
main(int argc, char **argv)
{
int i, ch, report_interval, affinity, align;
ND("g has size %d", (int)sizeof(g));
report_interval = 250; /* ms */
affinity = 0; /* no affinity */
align = 0; /* global variable */
bzero(&g, sizeof(g));
g.privs = getprivs();
g.nthreads = 1;
g.cpus = 1;
g.m_cycles = 0;
while ( (ch = getopt(argc, argv, "A:a:m:n:w:c:t:vl:")) != -1) {
switch(ch) {
default:
D("bad option %c %s", ch, optarg);
usage();
break;
case 'A': /* align */
align = atoi(optarg);
break;
case 'a': /* force affinity */
affinity = atoi(optarg);
break;
case 'n': /* cycles */
g.m_cycles = getnum(optarg);
break;
case 'w': /* report interval */
report_interval = atoi(optarg);
break;
case 'c':
g.cpus = atoi(optarg);
break;
case 't':
g.nthreads = atoi(optarg);
break;
case 'm':
g.test_name = optarg;
break;
case 'l':
g.arg = getnum(optarg);
break;
case 'v':
verbose++;
break;
}
}
argc -= optind;
argv += optind;
if (!g.test_name && argc > 0)
g.test_name = argv[0];
if (g.test_name) {
for (i = 0; tests[i].name; i++) {
if (!strcmp(g.test_name, tests[i].name)) {
g.fn = tests[i].fn;
g.scale = tests[i].scale;
if (g.m_cycles == 0)
g.m_cycles = tests[i].m_cycles;
if (g.scale == ONE_MILLION)
g.scale_name = "M";
else if (g.scale == 1000)
g.scale_name = "K";
else {
g.scale = 1;
g.scale_name = "";
}
break;
}
}
}
if (!g.fn) {
D("%s", "missing/unknown test name");
usage();
}
i = system_ncpus();
if (g.cpus < 0 || g.cpus > i) {
D("%d cpus is too high, have only %d cpus", g.cpus, i);
usage();
}
if (g.cpus == 0)
g.cpus = i;
if (g.nthreads < 1) {
D("bad nthreads %d, using 1", g.nthreads);
g.nthreads = 1;
}
i = sizeof(g.v.ctr) / g.nthreads*sizeof(g.v.ctr[0]);
if (align < 0 || align > i) {
D("bad align %d, max is %d", align, i);
align = i;
}
/* Install ^C handler. */
global_nthreads = g.nthreads;
signal(SIGINT, sigint_h);
ta = calloc(g.nthreads, sizeof(*ta));
/*
* Now create the desired number of threads, each one
* using a single descriptor.
*/
D("start %d threads on %d cores", g.nthreads, g.cpus);
for (i = 0; i < g.nthreads; i++) {
struct targ *t = &ta[i];
bzero(t, sizeof(*t));
t->g = &g;
t->me = i;
t->glob_ctr = &g.v.ctr[(i*align)/sizeof(g.v.ctr[0])];
D("thread %d ptr %p", i, t->glob_ctr);
t->affinity = affinity ? (affinity*i) % g.cpus : -1;
if (pthread_create(&t->thread, NULL, td_body, t) == -1) {
D("Unable to create thread %d", i);
t->completed = 1;
}
}
/* the main loop */
{
uint64_t my_count = 0, prev = 0;
uint64_t count = 0;
double delta_t;
struct timeval tic, toc;
gettimeofday(&toc, NULL);
for (;;) {
struct timeval now, delta;
uint64_t pps;
int done = 0;
delta.tv_sec = report_interval/1000;
delta.tv_usec = (report_interval%1000)*1000;
select(0, NULL, NULL, NULL, &delta);
gettimeofday(&now, NULL);
timersub(&now, &toc, &toc);
my_count = 0;
for (i = 0; i < g.nthreads; i++) {
my_count += ta[i].count;
if (ta[i].completed)
done++;
}
pps = toc.tv_sec* ONE_MILLION + toc.tv_usec;
if (pps < 10000)
continue;
pps = (my_count - prev)*ONE_MILLION / pps;
D("%" PRIu64 " %scycles/s scale %" PRIu64 " in %dus",
pps/g.scale,
g.scale_name, g.scale, (int)(toc.tv_sec* ONE_MILLION +
toc.tv_usec));
prev = my_count;
toc = now;
if (done == g.nthreads)
break;
}
D("total %" PRIu64 " cycles", prev);
timerclear(&tic);
timerclear(&toc);
for (i = 0; i < g.nthreads; i++) {
pthread_join(ta[i].thread, NULL);
if (ta[i].completed == 0)
continue;
/*
* Collect threads o1utput and extract information about
* how log it took to send all the packets.
*/
count += ta[i].count;
if (!timerisset(&tic) || timercmp(&ta[i].tic, &tic, <))
tic = ta[i].tic;
if (!timerisset(&toc) || timercmp(&ta[i].toc, &toc, >))
toc = ta[i].toc;
}
/* print output. */
timersub(&toc, &tic, &toc);
delta_t = toc.tv_sec + 1e-6* toc.tv_usec;
D("total %8.6f seconds", delta_t);
}
return (0);
}
/* end of file */
- [Qemu-devel] [PATCH v2] netmap backend (revised), Luigi Rizzo, 2013/01/22
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), Anthony Liguori, 2013/01/22
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), Stefan Hajnoczi, 2013/01/23
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), Luigi Rizzo, 2013/01/23
- Re: [Qemu-devel] [PATCH v2] netmap backend (revised), Stefan Hajnoczi, 2013/01/23
- [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)),
Luigi Rizzo <=
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Luigi Rizzo, 2013/01/23
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Stefan Hajnoczi, 2013/01/24
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Luigi Rizzo, 2013/01/24
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Stefan Hajnoczi, 2013/01/25
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Luigi Rizzo, 2013/01/28
- Re: [Qemu-devel] memcpy speed (Re: [PATCH v2] netmap backend (revised)), Paolo Bonzini, 2013/01/24