lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] Possible deadlock in LWIP 1.3.1 RC1


From: Eran Rundstein
Subject: [lwip-users] Possible deadlock in LWIP 1.3.1 RC1
Date: Sun, 23 Aug 2009 15:35:29 +0300

Hello,

I am using the following Python script on a Linux host to send data to an ARM AT91SAM9260-EK evaluation board running a proprietary OS, including a port of LWIP 1.3.1 RC1:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('192.168.0.35', 2030))

while True:
        s.send('x' * 100000)
        b = s.recv(10000)
        print len(b)

On the ARM board, I have a thread doing lwip_read with 128 bytes and then lwip_write with the received data in a loop.
What happens is this:
 - Incoming data is read by the 'echo' thread
 - The 'echo' thread calls lwip_write, which then blocks on the op_completed semaphore until the data is written
 - Repeat.

This flow goes on for a while, and then suddenly lwip_write does not return - looking at the thread's stack, it is indeed blocking on the op_completed semaphore. Nothing seems to signal it - what causes this is still a mystery to me.
At this stage, I forcefully close the connection from the Linux side. This will cause err_tcp() to be called from within the context of LWIP thread (tcpip_thread()). err_tcp() first attempts to post data to the connection's mbox with:
if (conn->recvmbox != SYS_MBOX_NULL) {
    /* Register event with callback */
    API_EVENT(conn, NETCONN_EVT_RCVPLUS, 0);
    sys_mbox_post(conn->recvmbox, NULL);
  }

And afterwards it may or may not signal the completion semaphore. Now, sys_mbox_post does not return until the message is posted to the queue. Assuming the queue is full, it will not return until a message is read from the queue and space is made available.
However, noone will be touching that queue as the thread that deals with the socket (the 'echo' thread) is blocking on the write completion.

With that said, I have moved this code to the end of the function - but the problem remains and it is beyond my present knowledge to tell what is causing this.

If anyone has any insights regarding this potential problem, or require more information, please let me know.

Many thanks,
Eran

--
Eran Rundstein
CTO
RTC Embedded Consulting

+972-54-5811085
address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]