[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lwip-users] strange TCP behavior, connection stalls
From: |
M.H. ten Berge |
Subject: |
[lwip-users] strange TCP behavior, connection stalls |
Date: |
Sat, 9 Apr 2016 17:50:10 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0 |
Hi all,
I'm encountering a strange situation when I try to download a file (ca
40 kB) from my local HTTP server. The first kB's go well, but then the
transfer stalls. It picks up a few times, which get few more kB
downloaded, but then stalls again. The stalls become longer and longer,
until the webserver times out and aborts the transfer. After the servers
times out, lwip keeps on trying (which is also odd) and gets stuck in
lwip_read().
The behavior is reproducible every time, the download has not succeeded
even once.
I assume that the TCP protocol in LWIP has been tested in practice
numerous times, so I must be doing something wrong...
I doubted if I should report this as a bug against esp-open-rtos, but
because there is much more LWIP knowledge on this list, I'll try this
first ;-)
Sorry for the long mail.
Setup and versions:
-dhcp/dns/router at 192.168.101.1
- webserver: a local Debian stable box, running Apache 2.4.10. Its
external IP is 217.19.31.195, internally it is at 192.168.102.22. The
router takes care of this, other devices such as my phone can access it
correctly.
- hardware: esp8266 (esp-03 module) at 192.168.101.237. It is about 1.5
meters away from the nearest Wifi access point. This AP is on channel 1,
the ESP also reports that it connects on channel 1. I checked with Wifi
Analyzer (Android app) that channel 1 is relatively free (the next AP is
20-25dB weaker). Other wireless devices can communicate absolutely fine
via this access point.
- the esp8266 is running esp-open-rtos (master branch from
https://github.com/Superhouse/esp-open-rtos). Two weeks ago, when I
first encountered the problem, I was using the version from March 22nd.
Today I tried again with HEAD from today:
https://github.com/SuperHouse/esp-open-rtos/commit/83c5f91bc09168c584be9d62966c069cdfcfa2d9.
- LWIP is integrated in esp-open-rtos (as a git submodule). It uses the
version from
https://github.com/SuperHouse/esp-lwip/tree/3cf8d514bd76e6ef77e6fa514d0ec6d96da7fd9a
According to the description on github, this is LWIP 1.4.1, with some
modifications to get it running on the esp8266 (mainly the low-level
network driver).
- local modifications (by me):
+ #define LWIP_POSIX_SOCKETS_IO_NAMES 0 (because the serial port
functions are also called read/write, so they were colliding)
+ #define MEMP_OVERFLOW_CHECK 1
+ #define MEMP_SANITY_CHECK 1
+ #define LWIP_STATS 1
+ #define LWIP_DEBUG
+ enabled some debug categories (couldn't enable them all due to size
constraints)
+ tried some different values for TCP_MSS, TCP_MAXRTX, TCP_SYNMAXRTX,
etc. The packet capture and log were made with TCP_MSS=536,
TCP_MAXRTX=12 and TCP_SYNMAXRTX=6.
What works:
- associate with wifi network
- get IP via DHCP
- DNS lookups
- HTTP GET request for a small PHP-script, which returns about 100 bytes
of JSON.
- receiving this JSON reply and closing the socket
- I can also ping the esp8266. Ping times are high (>90 msec), but I
assume this is caused by the enabled LWIP_DEBUG options, which have to
be pushed through a serial console at 115200 bps.
What does not work:
- downloading a 40 kB static file from the same http server. The file is
to be relayed to a serial port, but for now I commented that out, the
data is simply discarded.
I've captured the console/debug messages, see lwiplog.txt.
Furthermore, I have run tcpdump on the webserver (this means it does not
contain the dhcp or dns traffic). Because more http-requests were
handled at the time, I have filtered out everything unrelated to
192.168.101.237.
What happens in the log:
- Everything starts fine (associate with wifi, get an IP, etc).
- Two odd things in the log/pcap so far:
- the log contains two occurrences of 'DNS lookup found
44.131.255.63'. I have no idea what this IP is, or why it should have
been looked up.
- the first socket connection to the http server does not work
(packets 1-7 in the pcap). The next try (packet 8) succeeds almost
instantly.
- The HTTP GET request is done on line 374-392 (my code has some extra
printf's, which includes the http request contents).
- The user program parses the http reply header byte-by-byte (quite
uggly code, but it works). These are the 'API messages' in line 506-764
(repeatedly calling lwip_read to request 1 byte).
- The actual content is downloaded using a loop:
1. printf('r\n')
2. call lwip_read() with a buffer of 512 bytes (also tried 16, this
made no difference)
3. printf('R\n')
4. for each received byte, printf a dot
5. go back to step 1
- the first read returns at line 783. The data is consumed in lines
784-844. lwip_read is immediately called again, and returns immediately
(lines 844-846).
- this continues. However, new calls to lwip_read are taking longer and
longer. For example: the function is called at line 1573, and does not
return until line 1689.
- Finally, the web server gives up and closes the connection. LWIP does
not return from the lwip_read function anymore. I already stopped the
logs before this happened, but if anyone is interested I could make new
longer logs.
I'll try to attach the console log (lwiplog.txt) and the packet capture
(download_try005_filter.pcap). If that doesn't work, please find them here:
log: http://famtenberge.nl/dl/?t=2a7f5b49846ac565d131abed825528af
pcap: http://famtenberge.nl/dl/?t=0b65ff76dfd827225b3c238260af56ef
Does this problem sound familiar to anyone? What is going wrong here?
Any help would be very appreciated. Thanks in advance!
Kind regards,
Matthijs
lwiplog.txt
Description: Text document
download_try005_filter.pcap
Description: application/vnd.tcpdump.pcap
- [lwip-users] strange TCP behavior, connection stalls,
M.H. ten Berge <=