|
From: | Tazzari Davide |
Subject: | Re: [lwip-users] lwip lock |
Date: | Mon, 18 Apr 2011 17:18:01 +0200 |
I agree with you Kieran, but the problem is that I don't know where to look for.
I used the lwIP 1.3.2 port for avr32 and I didn't touch almost anything.
In one my long (and boring) previous post I have added the description of all the tasks that uses the lwIP with netconn api. I can reply it if you wish.
About other... I have looked for some timers and I have seen that in the lwip core there are a lot of them that I suppose correct. I said "I suppose" because I don't really know how to investigate.
Can you please suggest where to look for?
Test 1.
I have connected the device to my computer with a cross Ethernet cable so that I haven't any wireless, switch, ... in the middle.
The situation is pretty the same except the fact that the lock is harder to create. After a lot of F5 reload, everything locks while, in the normal situation, I need only 5-10 fast reload.
This could suggest the heavy traffic managed by the lwIP itself could interfere with the normal management. I don't know if it is really a timer; probably something related to the MAC itself but, as you said, at interrupt level. But I don't know where
What I have seen in this test is that the key is really the TCP_SEG: when there is at least an empty block there could be communication even if the lfree ram pointer is not in the top of the area, otherwise there is the lock.
About SYS_TIMEOUT: Everytime I ask a page (or at least a connection) a timeout is created. I have set 6 SYS_TIMEOUT. If I reload the page 5 times and wait, no error occurs. If 6 or more, the error counter is increased. This seems to have no relationship with the TCP_SEG. Anyway, after a lot of error, the lwIP continues to function. So, let's forget it for the moment.
Test 2:
I have put a Relais toggle in the web server task
WebServer task
...
for (;;)
{
iRestartBinding = 0;
pxHTTPListener = netconn_new( NETCONN_TCP );
netconn_bind(pxHTTPListener, NULL, webHTTP_PORT );
netconn_listen( pxHTTPListener );
int iTimeout = 1000;
//for( ; (iRestartBinding < 10) && (gucRestartWebServer == FALSE); iRestartBinding++)
for( ; ; ) // <<-- for this test purpose; In the real case the above line is present
{
REL_TGL; // <<-- for this test purpose
xLastFocusTime = xTaskGetTickCount();
vTaskDelayUntil( &xLastFocusTime, xDelayLength );
if (iGlobalWtdBomb == FALSE) // TRUE I am waiting for a WDT suicide
{
// Wait for a first connection.
#if LWIP_SO_RCVTIMEO
pxHTTPListener->recv_timeout = iTimeout;
#endif
pxNewConnection = netconn_accept(pxHTTPListener);
if (xTaskCreate(WebServerAnswerTask,
( signed portCHAR * ) "WebServerAnswer",
WEB_SERVER_STACK_SIZE,
pxNewConnection,
ethWEBSERVER_PRIORITY,
( xTaskHandle * ) NULL ) != pdPASS)
{
// Task not correctly created!!!
netconn_write( pxNewConnection, (char *) webHTTP_HTM_INTERNAL_ERROR, (u16_t) strlen( webHTTP_HTM_INTERNAL_ERROR ), NETCONN_COPY ); // error HTTP 500
netconn_close( pxNewConnection );
netconn_delete( pxNewConnection );
}
iRestartBinding = 0;
iTimeout = 5000;
}
} // end acquisition loop
gucRestartWebServer = FALSE;
netconn_close(pxHTTPListener);
while(netconn_delete(pxHTTPListener) != 0)
{
vTaskDelay(20);
}
pxHTTPListener = NULL;
}
...
Result...
When I reload the page slowly everything is ok almost forever.
When I reload the page faster I see that both firefox and explorer process the TCP connection, the GET request and immediately after they send [RST, ACK] to close the connection except the last one that waits for the device answer. I suppose that, due to the fact the browser hasn't received any answer and the user requests a reload they would like only the last one to be processed.
Every netconn_accept (time out or not) I can hear the relais toggle. If I press F5 5 times I hear 5 toggle. That's what I expect.
Sometimes one toggle misses (5 press of F5, 4 toggle!). Exactly in this case, I lose a TCP_SEG block and a portion of mem area.
1 toggle lost means also that the netconn_accept doesn't recognize the connection and, from web server task point of view, I cannot see the problem.
Again, this happens if there are lots of requests (connection, GET, [RST,ACK] from browser, close connection) before a (connection, GET, answer, [RST,ACK], close connection).
Sometimes I have seen this transaction in the middle of a reload
(Firefox) Connection [SYN]
(device) Connection [SYN, ACK]
(Firefox) Connection [ACK]
(Firefox) GET request
(Firefox) [TCP Retransmission] of the GET request
(device) [ACK] of the HTTP
(Firefox) [RST, ACK] without any answer form the device
It seems that this is one case of TCP_SEC lost. It is not easy to say because I don't know exactly when the loss happens and how I can relate it with the wireshark sniffing.
It seems also that the loss often (but not always) happens when a [TCP Retransmission] is present
Anyway, it seems there is something in the inner management of the [RST,ACK], the retransmission or something like that is probably not related to the code I have written.
How can I handle this? Where do I have to look for? I have no idea at the moment.
My milestone is that the lwIP port is correct but at this point I am not so sure. I still hope that I wrote the wrong piece of code but, as I have said, I have no idea where to look at.
I hope my new analysis can help
Best regards
Davide
[Prev in Thread] | Current Thread | [Next in Thread] |