qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] e1000: Fix the unexpected assumption that the receive buffer


From: Yong Huang
Subject: Re: [PATCH] e1000: Fix the unexpected assumption that the receive buffer is full
Date: Tue, 9 Jul 2024 10:56:04 +0800



On Tue, Jul 9, 2024 at 10:41 AM Jason Wang <jasowang@redhat.com> wrote:
On Mon, Jul 8, 2024 at 1:17 PM Yong Huang <yong.huang@smartx.com> wrote:
>
>
>
> On Mon, Jul 8, 2024 at 11:21 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On Sat, Jul 6, 2024 at 4:30 AM Hyman Huang <yong.huang@smartx.com> wrote:
>> >
>> > Unexpected work by certain Windows guests equipped with the e1000
>> > interface can cause the network to go down and never come back up
>> > again unless the guest's interface is reset.
>> >
>> > To reproduce the failure:
>> > 1. Set up two guests with a Windows 2016 or 2019 server operating
>> >    system.
>>
>> I vaguely remember e1000 support for Windows has been deprecated for
>> several years...
>>
>> That's why e1000e or igb is implemented in Qemu.
>>
>> > 2. Set up the e1000 interface for the guests.
>> > 3. Pressurize the network slightly between two guests using the iPerf tool.
>> >
>> > The network goes down after a few days (2-5days), and the issue
>> > is the result of not adhering to the e1000 specification. Refer
>> > to the details of the specification at the following link:
>> > https://www.intel.com/content/dam/doc/manual/pci-pci-x-family-gbe-controllers-software-dev-manual.pdf
>> >
>> > Chapter 3.2.6 describe the Receive Descriptor Tail register(RDT)
>> > as following:
>> > This register holds a value that is an offset from the base, and
>> > identifies the location beyond the last descriptor hardware can
>> > process. Note that tail should still point to an area in the
>> > descriptor ring (somewhere between RDBA and RDBA + RDLEN).
>> > This is because tail points to the location where software writes
>> > the first new descriptor.
>> >
>> > This means that if the provider—in this case, QEMU—has not yet
>> > loaded the packet,
>>
>> What do you mean by "load" here?
>
>
> Sorry for failing to describe the details.
>
> The guest driver retrieves the packet from the receive ring buffer
> after QEMU forwards it from the tun/tap interface in the e1000
> emulation.
>
> I used "load" to express "putting packets into the receive ring buffer."
>
>>
>>
>> > RDT should never point to that place.
>>
>> And "that place"?
>
> If a descriptor in the receive ring buffer has not been filled with a
> packet address by QEMU, the descriptor therefore doesn't have any
> available packets. The location of the descriptor should not be referred
> to by RDT because the location is in the range that "hardware" handles.
>
> "that place" means the location of the descriptor in the ring buffer
> that QEMU hasn't set any available packets related to.
>
>>
>>
>> > When
>> > implementing the emulation of the e1000 interface, QEMU evaluates
>> > if the receive ring buffer is full once the RDT equals the RDH,
>> > based on the assumption that guest drivers adhere to this
>> > criterion strictly.
>> >
>> > We applied the following log patch to assist in analyzing the
>> > issue and eventually obtained the unexpected information.
>> >
>> > Log patch:
>> > -----------------------------------------------------------------
>> > |--- a/hw/net/e1000.c
>> > |+++ b/hw/net/e1000.c
>> > |@@ -836,6 +836,9 @@ e1000_set_link_status(NetClientState *nc)
>> > | static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
>> > | {
>> > |     int bufs;
>> > |+    DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u, s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n",
>> > |+           s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH], s->mac_reg[RDT]);
>> > |+
>> > |     /* Fast-path short packets */
>> > |     if (total_size <= s->rxbuf_size) {
>> > |         if (s->mac_reg[RDH] == s->mac_reg[RDT] && s->last_overrun)
>> > |@@ -1022,6 +1025,9 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
>> > |         s->rxbuf_min_shift)
>> > |         n |= E1000_ICS_RXDMT0;
>> > |
>> > |+    DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u, s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n",
>> > |+           s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH], s->mac_reg[RDT]);
>> > |+
>> > -----------------------------------------------------------------
>> >
>> > The last few logs of information when the network is down:
>> >
>> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, s->mac_reg[RDH] = 897, s->mac_reg[RDT] = 885
>> > <- the receive ring buffer is checked for fullness in the
>> > e1000_has_rxbufs function, not full.
>> >
>> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885
>> > <- RDT stays the same, RDH updates to 898, and 1 descriptor
>> > utilized after putting the packet to ring buffer.
>> >
>> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885
>> > <- the receive ring buffer is checked for fullness in the
>> > e1000_has_rxbufs function, not full.
>> >
>> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885
>> > <- RDT stays the same, RDH updates to 899, and 1 descriptor
>> > utilized after putting the packet to ring buffer.
>> >
>> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885
>> > <- the receive ring buffer is checked for fullness in the
>> > e1000_has_rxbufs function, not full.
>> >
>> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 885
>> > <- RDT stays the same, RDH updates to 900 , and 1 descriptor
>> > utilized after putting the packet to ring buffer.
>> >
>> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 900
>> > <- The ring is full, according to e1000_has_rxbufs, because
>> > of the RDT update to 900 and equals RDH !
>>
>> Just to make sure I understand this, RDT==RDH means the ring is empty I think?
>>
>>
>> See commit:
>>
>> commit e5b8b0d4ba29fe1268ba049519a1b0cf8552a21a
>> Author: Dmitry Fleytman <dmitry@daynix.com>
>> Date:   Fri Oct 19 07:56:55 2012 +0200
>>
>>     e1000: drop check_rxov, always treat RX ring with RDH == RDT as empty
>>
>>     Real HW always treats RX ring with RDH == RDT as empty.
>>     Emulation is supposed to behave the same.
>
>
> Indeed, I'm confused :(,  the description in the comment claims that RX
> rings with RDH == RDT as empty, but in implementation, it treats that as
> overrun.
>
> See the following 2 contexts:
>
> 1. e1000_can_receive:
> static bool e1000_can_receive(NetClientState *nc)
> {
>     E1000State *s = qemu_get_nic_opaque(nc);
>     // e1000_has_rxbufs return true means ring buffer has
>     // available descriptors to use for QEMU.
>     // false means ring buffer overrun and QEMU should queue the packet
>     // and wait for the RDT update and available descriptors can be used.
>
>     return e1000x_rx_ready(&s->parent_obj, s->mac_reg) &&
>         e1000_has_rxbufs(s, 1) && !timer_pending(s->flush_queue_timer);
> }

Well we had in e1000_has_rx_bufs

    if (total_size <= s->rxbuf_size) {
        return s->mac_reg[RDH] != s->mac_reg[RDT];
    }

RDT!=RDH means RX ring has available descriptors for hardware?

IMHO, Yes.


Adding more people.

Thanks



--
Best regards

reply via email to

[Prev in Thread] Current Thread [Next in Thread]