qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] e1000: Fix the unexpected assumption that the receive buffer


From: Jason Wang
Subject: Re: [PATCH] e1000: Fix the unexpected assumption that the receive buffer is full
Date: Wed, 10 Jul 2024 11:43:26 +0800

On Tue, Jul 9, 2024 at 10:56 AM Yong Huang <yong.huang@smartx.com> wrote:
>
>
>
> On Tue, Jul 9, 2024 at 10:41 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On Mon, Jul 8, 2024 at 1:17 PM Yong Huang <yong.huang@smartx.com> wrote:
>> >
>> >
>> >
>> > On Mon, Jul 8, 2024 at 11:21 AM Jason Wang <jasowang@redhat.com> wrote:
>> >>
>> >> On Sat, Jul 6, 2024 at 4:30 AM Hyman Huang <yong.huang@smartx.com> wrote:
>> >> >
>> >> > Unexpected work by certain Windows guests equipped with the e1000
>> >> > interface can cause the network to go down and never come back up
>> >> > again unless the guest's interface is reset.
>> >> >
>> >> > To reproduce the failure:
>> >> > 1. Set up two guests with a Windows 2016 or 2019 server operating
>> >> >    system.
>> >>
>> >> I vaguely remember e1000 support for Windows has been deprecated for
>> >> several years...
>> >>
>> >> That's why e1000e or igb is implemented in Qemu.
>> >>
>> >> > 2. Set up the e1000 interface for the guests.
>> >> > 3. Pressurize the network slightly between two guests using the iPerf 
>> >> > tool.
>> >> >
>> >> > The network goes down after a few days (2-5days), and the issue
>> >> > is the result of not adhering to the e1000 specification. Refer
>> >> > to the details of the specification at the following link:
>> >> > https://www.intel.com/content/dam/doc/manual/pci-pci-x-family-gbe-controllers-software-dev-manual.pdf
>> >> >
>> >> > Chapter 3.2.6 describe the Receive Descriptor Tail register(RDT)
>> >> > as following:
>> >> > This register holds a value that is an offset from the base, and
>> >> > identifies the location beyond the last descriptor hardware can
>> >> > process. Note that tail should still point to an area in the
>> >> > descriptor ring (somewhere between RDBA and RDBA + RDLEN).
>> >> > This is because tail points to the location where software writes
>> >> > the first new descriptor.
>> >> >
>> >> > This means that if the provider—in this case, QEMU—has not yet
>> >> > loaded the packet,
>> >>
>> >> What do you mean by "load" here?
>> >
>> >
>> > Sorry for failing to describe the details.
>> >
>> > The guest driver retrieves the packet from the receive ring buffer
>> > after QEMU forwards it from the tun/tap interface in the e1000
>> > emulation.
>> >
>> > I used "load" to express "putting packets into the receive ring buffer."
>> >
>> >>
>> >>
>> >> > RDT should never point to that place.
>> >>
>> >> And "that place"?
>> >
>> > If a descriptor in the receive ring buffer has not been filled with a
>> > packet address by QEMU, the descriptor therefore doesn't have any
>> > available packets. The location of the descriptor should not be referred
>> > to by RDT because the location is in the range that "hardware" handles.
>> >
>> > "that place" means the location of the descriptor in the ring buffer
>> > that QEMU hasn't set any available packets related to.
>> >
>> >>
>> >>
>> >> > When
>> >> > implementing the emulation of the e1000 interface, QEMU evaluates
>> >> > if the receive ring buffer is full once the RDT equals the RDH,
>> >> > based on the assumption that guest drivers adhere to this
>> >> > criterion strictly.
>> >> >
>> >> > We applied the following log patch to assist in analyzing the
>> >> > issue and eventually obtained the unexpected information.
>> >> >
>> >> > Log patch:
>> >> > -----------------------------------------------------------------
>> >> > |--- a/hw/net/e1000.c
>> >> > |+++ b/hw/net/e1000.c
>> >> > |@@ -836,6 +836,9 @@ e1000_set_link_status(NetClientState *nc)
>> >> > | static bool e1000_has_rxbufs(E1000State *s, size_t total_size)
>> >> > | {
>> >> > |     int bufs;
>> >> > |+    DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u, 
>> >> > s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n",
>> >> > |+           s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH], 
>> >> > s->mac_reg[RDT]);
>> >> > |+
>> >> > |     /* Fast-path short packets */
>> >> > |     if (total_size <= s->rxbuf_size) {
>> >> > |         if (s->mac_reg[RDH] == s->mac_reg[RDT] && s->last_overrun)
>> >> > |@@ -1022,6 +1025,9 @@ e1000_receive_iov(NetClientState *nc, const 
>> >> > struct iovec *iov, int iovcnt)
>> >> > |         s->rxbuf_min_shift)
>> >> > |         n |= E1000_ICS_RXDMT0;
>> >> > |
>> >> > |+    DBGOUT(RX, "rxbuf_size = %u, s->mac_reg[RDLEN] = %u, 
>> >> > s->mac_reg[RDH] = %u, s->mac_reg[RDT] = %u\n",
>> >> > |+           s->rxbuf_size, s->mac_reg[RDLEN], s->mac_reg[RDH], 
>> >> > s->mac_reg[RDT]);
>> >> > |+
>> >> > -----------------------------------------------------------------
>> >> >
>> >> > The last few logs of information when the network is down:
>> >> >
>> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, 
>> >> > s->mac_reg[RDH] = 897, s->mac_reg[RDT] = 885
>> >> > <- the receive ring buffer is checked for fullness in the
>> >> > e1000_has_rxbufs function, not full.
>> >> >
>> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, 
>> >> > s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885
>> >> > <- RDT stays the same, RDH updates to 898, and 1 descriptor
>> >> > utilized after putting the packet to ring buffer.
>> >> >
>> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, 
>> >> > s->mac_reg[RDH] = 898, s->mac_reg[RDT] = 885
>> >> > <- the receive ring buffer is checked for fullness in the
>> >> > e1000_has_rxbufs function, not full.
>> >> >
>> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, 
>> >> > s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885
>> >> > <- RDT stays the same, RDH updates to 899, and 1 descriptor
>> >> > utilized after putting the packet to ring buffer.
>> >> >
>> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, 
>> >> > s->mac_reg[RDH] = 899, s->mac_reg[RDT] = 885
>> >> > <- the receive ring buffer is checked for fullness in the
>> >> > e1000_has_rxbufs function, not full.
>> >> >
>> >> > e1000: total_size = 64, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, 
>> >> > s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 885
>> >> > <- RDT stays the same, RDH updates to 900 , and 1 descriptor
>> >> > utilized after putting the packet to ring buffer.
>> >> >
>> >> > e1000: total_size = 1, rxbuf_size = 2048, s->mac_reg[RDLEN] = 16384, 
>> >> > s->mac_reg[RDH] = 900, s->mac_reg[RDT] = 900
>> >> > <- The ring is full, according to e1000_has_rxbufs, because
>> >> > of the RDT update to 900 and equals RDH !
>> >>
>> >> Just to make sure I understand this, RDT==RDH means the ring is empty I 
>> >> think?
>> >>
>> >>
>> >> See commit:
>> >>
>> >> commit e5b8b0d4ba29fe1268ba049519a1b0cf8552a21a
>> >> Author: Dmitry Fleytman <dmitry@daynix.com>
>> >> Date:   Fri Oct 19 07:56:55 2012 +0200
>> >>
>> >>     e1000: drop check_rxov, always treat RX ring with RDH == RDT as empty
>> >>
>> >>     Real HW always treats RX ring with RDH == RDT as empty.
>> >>     Emulation is supposed to behave the same.
>> >
>> >
>> > Indeed, I'm confused :(,  the description in the comment claims that RX
>> > rings with RDH == RDT as empty, but in implementation, it treats that as
>> > overrun.
>> >
>> > See the following 2 contexts:
>> >
>> > 1. e1000_can_receive:
>> > static bool e1000_can_receive(NetClientState *nc)
>> > {
>> >     E1000State *s = qemu_get_nic_opaque(nc);
>> >     // e1000_has_rxbufs return true means ring buffer has
>> >     // available descriptors to use for QEMU.
>> >     // false means ring buffer overrun and QEMU should queue the packet
>> >     // and wait for the RDT update and available descriptors can be used.
>> >
>> >     return e1000x_rx_ready(&s->parent_obj, s->mac_reg) &&
>> >         e1000_has_rxbufs(s, 1) && !timer_pending(s->flush_queue_timer);
>> > }
>>
>> Well we had in e1000_has_rx_bufs
>>
>>     if (total_size <= s->rxbuf_size) {
>>         return s->mac_reg[RDH] != s->mac_reg[RDT];
>>     }
>>
>> RDT!=RDH means RX ring has available descriptors for hardware?
>
>
> IMHO, Yes.

Just to make sure we are on the same page, so

RDT!=RDH, descriptors available for hardware
RDT==RDH, descriptor ring is empty for hardware

That is currently what the code did. Seems nothing wrong, or anything
I missed here?

Thanks

>
>>
>> Adding more people.
>>
>> Thanks
>>
>
>
> --
> Best regards




reply via email to

[Prev in Thread] Current Thread [Next in Thread]