qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Proposed patch: huge RX speedup for hw/e1000.c


From: Paolo Bonzini
Subject: Re: [Qemu-devel] Proposed patch: huge RX speedup for hw/e1000.c
Date: Thu, 31 May 2012 09:38:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

Il 31/05/2012 00:53, Luigi Rizzo ha scritto:
> The image contains my fast packet generator "pkt-gen" (a stock
> traffic generator such as netperf etc. is too slow to show the
> problem). pkt-gen can send about 1Mpps in this configuration using
> -net netmap in the backend. The qemu process in this case takes 100%
> CPU. On the receive side, i cannot receive more than 50Kpps, even if i
> flood the bridge with a a huge amount of traffic. The qemu process stays
> at 5% cpu or less.
> 
> Then i read on the docs in main-loop.h which says that one case where
> the qemu_notify_event() is needed is when using 
> qemu_set_fd_handler2(), which is exactly what my backend uses
> (similar to tap.c)

The path is a bit involved, but I think Luigi is right.  The docs say
"Remember to call qemu_notify_event whenever the [return value of the
fd_read_poll callback] may change from false to true."  Now net/tap.c has

    static int tap_can_send(void *opaque)
    {
        TAPState *s = opaque;

        return qemu_can_send_packet(&s->nc);
    }

and (ignoring VLANs) qemu_can_send_packet is

    int qemu_can_send_packet(VLANClientState *sender)
    {
        if (sender->peer->receive_disabled) {
            return 0;
        } else if (sender->peer->info->can_receive &&
                   !sender->peer->info->can_receive(sender->peer)) {
            return 0;
        } else {
            return 1;
        }
    }

So whenever receive_disabled goes from 0 to 1 or can_receive goes from 0 to 1,
the _peer_ has to call qemu_notify_event.  In e1000.c we have

    static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
    {
        int bufs;
        /* Fast-path short packets */
        if (total_size <= s->rxbuf_size) {
            return s->mac_reg[RDH] != s->mac_reg[RDT] || !s->check_rxov;
        }
        if (s->mac_reg[RDH] < s->mac_reg[RDT]) {
            bufs = s->mac_reg[RDT] - s->mac_reg[RDH];
        } else if (s->mac_reg[RDH] > s->mac_reg[RDT] || !s->check_rxov) {
            bufs = s->mac_reg[RDLEN] /  sizeof(struct e1000_rx_desc) +
                s->mac_reg[RDT] - s->mac_reg[RDH];
        } else {
            return false;
        }
        return total_size <= bufs * s->rxbuf_size;
    }

    static int
    e1000_can_receive(VLANClientState *nc)
    {
        E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
    
        return (s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1);
    }

So as a conservative approximation, you need to fire qemu_notify_event
whenever you write to RDH, RDT, RDLEN and RCTL, or when check_rxov becomes
zero.  In practice, only RDT, RCTL and check_rxov matter.  Luigi, does this
patch work for you?

diff --git a/hw/e1000.c b/hw/e1000.c
index 4573f13..0069103 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -295,6 +295,7 @@ set_rx_control(E1000State *s, int index, uint32_t val)
     s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1;
     DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT],
            s->mac_reg[RCTL]);
+    qemu_notify_event();
 }
 
 static void
@@ -922,6 +923,7 @@ set_rdt(E1000State *s, int index, uint32_t val)
 {
     s->check_rxov = 0;
     s->mac_reg[index] = val & 0xffff;
+    qemu_notify_event();
 }
 
 static void


RDT is indeed written in the ISR.  In the Linux driver, e1000_clean_rx_irq
calls adapter->alloc_rx_buf which is e1000_alloc_rx_buffers.  There you
see this:

        if (likely(rx_ring->next_to_use != i)) {
                rx_ring->next_to_use = i;
                if (unlikely(i-- == 0))
                        i = (rx_ring->count - 1);

                /* Force memory writes to complete before letting h/w
                 * know there are new descriptors to fetch.  (Only
                 * applicable for weak-ordered memory model archs,
                 * such as IA-64). */
                wmb();
                writel(i, hw->hw_addr + rx_ring->rdt);
        }

Similarly for all other devices:
- cadence_gem -> GEM_NWCTRL
- dp8393x -> SONIC_CR, SONIC_ISR
- eepro100 -> set_ru_state
- mcf_fec -> mcf_fec_enable_rx
- milkymist-minimax2 -> R_STATE0, R_STATE1
- mipsnet -> MIPSNET_INT_CTL, MIPSNET_RX_DATA_BUFFER
- ne2000 -> EN0_STARTPG, EN0_STOPPG, E8390_CMD
- opencores_eth -> TX_BD_NUM, MODER, rx_desc
- pcnet -> pcnet_start, csr[5]
- rtl8139 -> RxBufPtr and Cfg9346
- smc91c111 -> RCR, smc91c111_release_packet
- spapr_llan -> h_add_logical_lan_buffer
- stellaris_enet -> RCTL, DATA
- xgmac -> DMA_CONTROL
- xilinx_axienet -> rcw[1]
- xilinx_ethlite -> R_RX_CTRL0

For Xen I think this is not possible at the moment because it doesn't
implement rx notification.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]