Re: [lwip-users] lwip lock

Hi Kieran,

I have done this improve in my webserver

WebServer task (old version)

...

for (;;)

{

iRestartBinding = 0;

pxHTTPListener = netconn_new( NETCONN_TCP );

netconn_bind(pxHTTPListener, NULL, webHTTP_PORT );

netconn_listen( pxHTTPListener );

int iTimeout = 1000;

for( ; (iRestartBinding < 10) && (gucRestartWebServer == FALSE); iRestartBinding++)

{

xLastFocusTime = xTaskGetTickCount();

vTaskDelayUntil( &xLastFocusTime, xDelayLength );

if (iGlobalWtdBomb == FALSE) // TRUE I am waiting for a WDT suicide

{

// Wait for a first connection.

#if LWIP_SO_RCVTIMEO

pxHTTPListener->recv_timeout = iTimeout;

#endif

pxNewConnection = netconn_accept(pxHTTPListener);

if(pxNewConnection != NULL)

{

prvweb_ParseHTMLRequest( pxNewConnection );

netconn_close( pxNewConnection );

netconn_delete( pxNewConnection );

iRestartBinding = 0;

iTimeout = 5000;

}// end if new connection

else

{

iTimeout = 1000;

}

} // end acquisition loop

gucRestartWebServer = FALSE;

netconn_close(pxHTTPListener);

while(netconn_delete(pxHTTPListener) != 0)

{

vTaskDelay(20);

}

pxHTTPListener = NULL;

}

...

static unsigned char prvweb_ParseHTMLRequest( struct netconn *pxNetCon )

{

struct netbuf *pxRxBuffer;

portCHAR *pcRxString;

unsigned portSHORT usLength;

/* We expect to immediately get data. */

pxNetCon->recv_timeout = 1000;

pxRxBuffer = netconn_recv( pxNetCon );

if( pxRxBuffer != NULL )

{

/* Where is the data? */

netbuf_data( pxRxBuffer, ( void * ) &pcRxString, &usLength );

...

netbuf_delete( pxRxBuffer );

return 0;

}

else

{

return -1;

}

This was my first implementation. Why these two loops? Because, in this case, when the ethernet cable is unplugged and then plugged I recognize it and create again the listener. Anyway I loose the ethernet!!! I don't know if this is THE solution but, at least, this is a solution.

After your comments I changed the web server task into a more flexible structure: on each accepted connection, I create a task to serve it in this way

portTASK_FUNCTION( WebServerAnswerTask, pvParameters )

{

struct netconn * pxNewConnection = (struct netconn *) pvParameters;

prvweb_ParseHTMLRequest( pxNewConnection );

netconn_close( pxNewConnection );

netconn_delete( pxNewConnection );

vTaskDelete( NULL );

}

And ...

if(pxNewConnection != NULL)

{

if (xTaskCreate(WebServerAnswerTask,

( signed portCHAR * ) "WebServerAnswer",

WEB_SERVER_STACK_SIZE,

pxNewConnection,

ethWEBSERVER_PRIORITY,

( xTaskHandle * ) NULL ) != pdPASS)

{

// Task not correctly created!!!

netconn_write( pxNewConnection, (char *) webHTTP_HTM_INTERNAL_ERROR, (u16_t) strlen( webHTTP_HTM_INTERNAL_ERROR ), NETCONN_COPY ); // error HTTP 500

netconn_close( pxNewConnection );

netconn_delete( pxNewConnection );

}

iRestartBinding = 0;

iTimeout = 5000;

}// end if new connection

else

{

iTimeout = 1000;

}

instead of

prvweb_ParseHTMLRequest( pxNewConnection );

netconn_close( pxNewConnection );

netconn_delete( pxNewConnection );

directly in the main web server task

Results:

Web server faster.

MBOX full has never happened any more

Mem area stuck is reduced but, unfortunately, not to zero. Very few times I have seen that the lfree pointer is different (and stucked) to ram pointer. In that few cases the web server remains not accessible till a reset. I monitore this value and I reset the machine (WDT) if occours. I dislike very much this but... anyway... this doesn't happen so often.

Let's consider memp.

Now, TCP_SEG seems correct and it seems that no blocks are lost.

TCP_PCB, instead, goes to full usage almost immediately. I have set the limit to 12 and then to 30 but anytime a connection appears this number increases to reach the limit. My home page contains 8 images, 1 css and 1 js so, in a couple of reload I reach the limit (whatever set)

I have read somewhere that even the connection is closed the pcb remains in a wait state (to wait for connection sinchronisation packet lost in the net) for a couple of minute and the rule is to use the "not used" pcb then the "wait" pcb so at the beginning I didn't take care of it. After 10 minutes waited, the relative lwip_stats.memp[i].used is still equal to the limit or, at least, one less: 12 limit, 11 used; the "used" never goes to zero.

What I see is that, when the use pcb value is well below the limit the web server is almost fantastic, when the pcb value is near the limit the web server is slower and (this is the bad thing) sometimes locks. In that cases the lfree pointer of the mem area is stucked to a value different from the ram pointer.

Moreover. For memory code problem I transferred all the code to SDRam. Of course I see a speed reduction but I expected it. The problem is that after few web server connection the web server sometimes locks i.e. connection refused ([RST, ACK] immediately after a [SYN] request) and no possibility to restart till a reset. It seems that this happens when the TCP_PCB limit is reached no matter the value of this limit. But sometimes everything functions no matter these values.

The code is exactly the same, the difference is where this code is fetched from.

About the lwip interface.

I used in all the code only the netconn api (or at least I this is my intention!). I suppose I make some mistakes or somewhere in the code (FreeRtos? LWIP itself? My fault? ...) there is something that uses a low level lwip access I didn't find.

Here is the lwip connection I have

1) WebServer (shown above)

2) PortalConnection

... // Send and receive function

* pps_Connection = netconn_new(NETCONN_TCP);

error_get_web = netconn_connect(* pps_Connection, &ipaddr, gs_EthernetParameters.siPort);

if(error_get_web == 0)

{

if ((* pps_Connection)->pcb.tcp->state != ESTABLISHED) // if the portal doesn't respond I don't receive any error at all!

{

DestroyConnection(pps_Connection);

return ERR_CONN;

}

else

{

error_get_web = netconn_write(* pps_Connection, pcBuffer, iSize, NETCONN_COPY );

if ((* pps_Connection)->state != NETCONN_NONE)

{

// error code but connection not destroyed. I don't know what to do here and if I have to do something!!!

}

#if LWIP_SO_RCVTIMEO

ps_Connection->recv_timeout = 10000; // 10 sec max

#endif

unsigned char ucFlagFirstPage = TRUE;

while( (nb = netconn_recv( ps_Connection ) ) != NULL )

{

netbuf_data( nb, (void *) & pcPageData, & usLength );

... // transfer data to a temporary file to be analyzed later.

netbuf_delete(nb);

}

#if LWIP_SO_RCVTIMEO

if (ps_Connection->err == ERR_TIMEOUT)

{

DestroyConnection(& ps_Connection);

return ERR_TIMEOUT;

}

#endif

DestroyConnection(& ps_Connection);

And

void DestroyConnection(struct netconn ** pps_Connection)

/// \breif Destroy active connection

/// \param pps_Connection pointer to pointer to connection

{

if (pps_Connection == NULL)

return;

netconn_close(* pps_Connection);

while(netconn_delete(* pps_Connection) != 0)

{

vTaskDelay(DELAY_TO_WAIT_DISPOSE_CONNECTION);

}

* pps_Connection = NULL;

}

3) UDP Debug Client (this task sends data to a remote client).

conn = netconn_new(NETCONN_UDP);

if (conn != NULL)

{

nb = netbuf_new();

netconn_connect(conn, &ipaddr, ti_UdpPortDebug.i);

while (xQueueReceive(xQueueUdpDebug, & s_Block, 0) == pdTRUE)

{

sprintf(pcBitmaskCode, "####### Code: %02x - %02x #######\r\n", s_Block.ucClassCode, s_Block.ucSubClassCode);

netbuf_ref(nb, pcBitmaskCode, strlen(pcBitmaskCode));

cError = netconn_send(conn, nb);

vTaskDelay(10);

int len = strlen(s_Block.pcTextBloc);

int i = 0;

while ((len - i) > 1000)

{

netbuf_ref(nb, (char *) & s_Block.pcTextBloc[i], (unsigned short)1000);

cError = netconn_send(conn, nb);

i += 1000;

vTaskDelay(20);

}

if (len - i)

{

netbuf_ref(nb, (char *) & s_Block.pcTextBloc[i], (unsigned short)(len - i));

cError = netconn_send(conn, nb);

vTaskDelay(20);

}

netbuf_ref(nb, "\r\n\r\n", 4);

cError = netconn_send(conn, nb);

vTaskDelay(5);

vPortFree(s_Block.pcTextBloc);

vTaskDelay(5);

}

netconn_disconnect(conn);

netbuf_free(nb);

netbuf_delete(nb);

}

Forget for the moment the s_Block data; it is a structure to enqueue a debug message text

4) UDP Configuration Server

portTASK_FUNCTION( vBasicUDPCOMSERVER, pvParameters )

{

struct udp_pcb *connUdp;

err_t myError;

connUdp = udp_new();

myError = udp_bind(connUdp, IP_ADDR_ANY, UDPCOMNET_PORT);

udp_recv(connUdp, Server_udp_recv, NULL);

cUDPTxBuffer[0] = myError;

// Loop forever

for( ;; )

{

vTaskDelay(1000);

__asm__ __volatile__("nop");

}

void Server_udp_recv(void *_args, struct udp_pcb *upcb, struct pbuf *pBuffUdp, struct ip_addr *Remoteaddr, u16_t Remote_port_udp)

{

int uiUdpLenMessage= 0;

if(pBuffUdp!= NULL)

{

.. // message analyzed

udp_sendto(...); // send the answer

pbuf_free(pBuffUdp);

}

5) UDP Management Server

Exactly as the UDP Configuration Server

Anyway, during the web server lock, udp servers and client were not used

In my knowledge, nothing else is using lwip. Where do I have to look for unknown low level lwip access?

Any further clever idea about all these problems?

Sorry for boring with such huge e-mail

Best regards

Davide

-----Original Message-----
From: address@hidden [mailto:address@hidden On Behalf Of Kieran Mansley
Sent: martedì 22 marzo 2011 16:56
To: Mailing list for lwIP users
Subject: Re: [lwip-users] lwip lock

On Tue, 2011-03-22 at 16:41 +0100, Tazzari Davide wrote:

> Anyone has ever seen such a problem?

It sounds like you're corrupting internal stack state by having more

than one thread active in lwIP's core at the same time. This would also

explain tcpip_thread being stalled as it is probably stuck in a loop

iterating a corrupt list.

> Any suggestion on how to solve it?

Make sure that only one thread is active in lwIP at once. This should

in your case be the tcpip_thread. All other threads (including

interrupts) should make sure they're not calling directly into lwIP and

are instead queueing work for the tcpip_thread to perform for them. If

you're using the sockets API then most of this will be done for you but

you still need to be careful; you can't use one socket in two different

thread for example. Make sure your driver is interfacing to lwIP

correctly as that was a common source of porting errors.

Kieran

_______________________________________________

lwip-users mailing list

address@hidden

http://lists.nongnu.org/mailman/listinfo/lwip-users

From:	Tazzari Davide
Subject:	Re: [lwip-users] lwip lock
Date:	Tue, 12 Apr 2011 17:36:17 +0200