qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Proposed patch: huge RX speedup for hw/e1000.c


From: Luigi Rizzo
Subject: Re: [Qemu-devel] Proposed patch: huge RX speedup for hw/e1000.c
Date: Thu, 31 May 2012 00:53:52 +0200



On Thu, May 31, 2012 at 12:11 AM, Jan Kiszka <address@hidden> wrote:
On 2012-05-30 23:55, Luigi Rizzo wrote:
> you can take the freebsd image from the netmap page in my link and run it
> in qemu, and then run the pkt-gen program in the image in either
> send or receive mode. But this is overkill, as you have described the
> problem exactly in your post: when the guest reads the packets from
> the emulated device (e1000 in my case, but most of them have the
> problem) it fails to wake up the thread blocked in main_loop_wait().

OK, so there are no QEMU code changes involved? Then, what is your
command line and what is the QEMU version?

# x86_64-softmmu/qemu-system-x86_64 -version
QEMU emulator version 1.0,1, Copyright (c) 2003-2008 Fabrice Bellard

the command line for my test is (this is on FreeBSD 9, with a pure userland qemu
without any kqemu support)

    x86_64-softmmu/qemu-system-x86_64 -m 512 -hda picobsd.bin -net nic,model=e1000 -net netmap,ifname=valeXX

-net netmap is my fast bridge, code will be available in a couple of days,
in the meantime you can have a look at netmap link in my original post to see how
qemu accesses the host adapter and how fast it is.
The problem exists als you use -net tap, ... just at lower speed, and maybe
too low to notice.

The image contains my fast packet generator "pkt-gen" (a stock traffic generator
such as netperf etc. is too slow to show the problem). pkt-gen can send about 1Mpps
in this configuration using -net netmap in the backend. The qemu process in this
case takes 100% CPU.
On the receive side, i cannot receive more than 50Kpps, even if i flood the
bridge with a a huge amount of traffic. The qemu process stays at 5% cpu or less.

Then i read on the docs in main-loop.h which says that one case where
the qemu_notify_event() is needed is when using 
qemu_set_fd_handler2() , which is exactly what my backend uses
(similar to tap.c)

Once i add the notify, the receiver can do 1 Mpps and can use 100% CPU
(and it is not in a busy wait, if the offered traffic goes down, the qemu process
uses less and less cpu).

 
>
> I am unclear on the terminology (what is frontend and what is backend ?)

Frontend is the emulated NIC here, backend the host-side interface, i.e.
slirp ("user"), tap, socket.

thanks for the clarification. 

> but  it is the guest side that has to wake up the qemu process: the file
> descriptor talking to the host side (tap, socket, bpf ...) has already
> fired its events and the only thing it could do is cause a busy wait
> if it keeps passing a readable file descriptor to select.

Can't follow. How did you analyze your delays?

see above.
 
>
> I thought your slirp.c patch was also on the same "side" as e1000.c

It was only in the backend, not any frontend driver. It's generally no
business of the frontend driver to kick (virtio has some exceptional path).

BTW, your patch is rather untargeted as it kicks on every MMIO write of
the e1000. Hard to asses what actually makes the difference for you.

in my case it is easy said - the process that reads from the interface on the guest
is only doing one read and one write access to the RDT register for each
batch of packets. The RDT change is the one that frees the rx buffers.

Surely the patch could be made more specific, but this requires a lot of
investigation on which register accesses may require attention (and there are
so many of them, say if the guest decides to reset the ring, the card, etc.,
that I don't think it is worth the trouble.)

cheers
luigi

reply via email to

[Prev in Thread] Current Thread [Next in Thread]