|
From: | address@hidden |
Subject: | Re: [lwip-devel] in-place overwriting of payload via static "tcphdr" pointer. |
Date: | Wed, 10 Jan 2018 19:31:42 +0100 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 |
Bernhart Pelger wrote:
[..] It can't get any clearer than that: it's the write access to the tcp header
You don't need to be clearer, it's clear enough that in your precise setup, this seems to be the problem.
As for the other bug in the Xilinx driver (freeing the DMA buffers too early):
There is no "other" bug. This *is* the bug. By flushing the cache line in tcp_input, you hide this bug!
With my current network configuration I should run into overwrite-problems after about 6ms to 10ms (at max rate RX traffic). However since I'm not using an operating system, it *is* possible to guarantee that the netif-function is called frequently enough.
Well, this works OK until you start queueing rxpbufs at some point. A simple trigger to let this happen is enabling TCP_QUEUE_OOSEQ and losing an rx segment (in the network, not in your device). A retransmission takes more than 10ms in most cases. Then you have queued pbufs referenced by the tcp rx code long enough to trigger an overwritten packet.
I don't expect this to happen in your tests though. This is the kind of problem that makes connections drop in a more sporadic way some time in the future when you've had your device out there somewhere running OK for years. TCP checksum might not help you here, since the queued packet already had its checksum checked when being queued. The overwritten data is accepted as being correct later!
After all, there's nothing to argue about any more. The driver is broken and it's your choice to stay with it or to keep it like that. I just want people reading this list to know. Just in case someone else is usind this driver too...
Cheers, Simon
[Prev in Thread] | Current Thread | [Next in Thread] |