qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled.


From: Dmitry Fleytman
Subject: Re: [Qemu-devel] [PATCH 1/2] Ignore RX tail kicks when RX disabled.
Date: Thu, 18 Oct 2012 18:12:50 +0200

Great! Thanks, Alex.

I'll prepare a new changeset that drops check_rxov completely.
Also migration-related patch becomes unneeded with this solution.

On Thu, Oct 18, 2012 at 6:06 PM, Alexander Duyck
<address@hidden> wrote:
> On 10/18/2012 07:31 AM, Stefan Hajnoczi wrote:
>> On Thu, Oct 18, 2012 at 10:34 AM, Dmitry Fleytman <address@hidden> wrote:
>>> The real purpose of check_rxov it a bit confusing indeed, mainly
>>> because of unclear name (rename?),
>>> however it works as following:
>>>
>>> There are 2 possible when RDT == RDH for RX ring:
>>>     1. Device used all the buffers from ring, no empty buffers available
>>>     2. Driver fully refilled the ring and all buffers are empty and ready 
>>> to use
>
> The 2nd case is not true.  We should only have RDT == RDH when the ring
> is empty.  If RDT == RDH and the ring is full then we have a bug in the
> driver.  The driver should only ever allow RDT to be one less than head,
> or ring size - 1 if head is 0.
>
>>> check_rxov is used to distinguish these 2 cases:
>>>     1. It must be 1 initially (init, reset, etc.)
>>>     2. It must be set to one when device uses buffer
>>>     3. It must be set to 0 when driver adds buffer to the ring
>>> check_rxov == 1 - ring is empty
>>> check_rxov == 0 - ring is full
>>>
>>> Indeed, RX init sequence doesn't look logical, however this is the way
>>> all Intel driver behave from e1000 and up to ixgbe.
>>> Also see some explanation here:
>>> http://permalink.gmane.org/gmane.linux.kernel/1375917
>>>
>>> If we drop check_rxov and always treat RDH == RDT as empty ring we'll
>>> probably get correct behavior for current Linux driver's code (needs
>>> testing of course),
>>> however we have no idea how Windows drivers work.
>
> The windows driver should work the same way.  If RDH == RDT the hardware
> will treat that as a empty ring and will hang.  If there is a driver
> that is setting RDH == RDT to indicate the ring is full please let us
> know as that is likely a buggy driver.
>
>> Thanks, for the great explanation, Dmitry.
>>
>> Alexander: I CCed you because I hope you might be able to explain what
>> the 82540EM card does when a driver sets RDT to the value of RDH.  The
>> QEMU NIC emulation code treats this as a full ring (i.e. the
>> descriptors are valid and will be filled in by the hardware).  Does
>> the real hardware act like this or will it treat this condition as
>> ring empty (i.e. if the driver sets RDT to the value of RDH then the
>> hardware stops receive because there are no descriptors available)?
>>
>> I can't find a statement in the Intel datasheet about what happens
>> when the driver sets RDT = RDH.  The QEMU check_rxov variable is
>> trying to distinguish between ring empty (RDH has moved to RDT) and
>> ring full (driver has set RDH = RDT because the full descriptor ring
>> is available).
>
> If RDT == RDH then we should stop receiving traffic.  As far as I know
> all of our e1000 hardware treat RDT == RDH as an empty ring state.  All
> of the drivers should have code in place to stop it.  For example the
> E1000_DESC_UNUSED macro should be returning ring size - 1 in the case of
> RDT == RDH which will result in the head being 0 and the tail being ring
> size - 2.
>
>> Dmitry: At this point we'd need to test what happens on real hardware
>> when RDH = RDT in order to be able to remove check_rxov.  As you
>> mentioned, with the Linux e1000 driver we don't see ring full RDH =
>> RDT:
>>
>>         /* call E1000_DESC_UNUSED which always leaves
>>          * at least 1 descriptor unused to make sure
>>          * next_to_use != next_to_clean */
>>         for (i = 0; i < adapter->num_rx_queues; i++) {
>>                 struct e1000_rx_ring *ring = &adapter->rx_ring[i];
>>                 adapter->alloc_rx_buf(adapter, ring,
>>                                       E1000_DESC_UNUSED(ring));
>>         }
>>
>> Here some sample output from a QEMU printf, notice how RDH is never
>> the same as RDT once rx begins:
>>
>> set_rdt rdh=0 rdt_old=0 rdt_new=0
>> set_rdt rdh=0 rdt_old=0 rdt_new=254
>> set_rdt rdh=1 rdt_old=254 rdt_new=255
>> set_rdt rdh=2 rdt_old=255 rdt_new=0
>> set_rdt rdh=3 rdt_old=0 rdt_new=1
>> set_rdt rdh=4 rdt_old=1 rdt_new=2
>> set_rdt rdh=5 rdt_old=2 rdt_new=3
>> set_rdt rdh=6 rdt_old=3 rdt_new=4
>> set_rdt rdh=7 rdt_old=4 rdt_new=5
>> set_rdt rdh=9 rdt_old=5 rdt_new=7
>> set_rdt rdh=10 rdt_old=7 rdt_new=8
>> set_rdt rdh=11 rdt_old=8 rdt_new=9
>> set_rdt rdh=12 rdt_old=9 rdt_new=10
>> set_rdt rdh=13 rdt_old=10 rdt_new=11
>> set_rdt rdh=14 rdt_old=11 rdt_new=12
>>
>> The iPXE 'intel' driver (supports e1000 cards) also does not set RDH =
>> RDT for full rx ring, instead it only uses 4 out of 8 descriptors at a
>> time.
>>
>> The reason I'm digging into the need for check_rxov is because it's a
>> dangerous piece of code to have.  If check_rxov logic is ever out of
>> sync we risk memory corruption.  I'd really like to drop it
>> completely.
>>
>> Stefan
>
> There should be no need for check_rxov.  As far as I know none of our
> drivers will ever set RDT == RDH if there are descriptors available on
> the ring.
>
> Thanks,
>
> Alex



-- 
Dmitry Fleytman
Technology Expert and Consultant,

Daynix Computing Ltd.

Cell: +972-54-2819481

Skype: dmitry.fleytman



reply via email to

[Prev in Thread] Current Thread [Next in Thread]