lwip-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-devel] [bug #56531] Missing locking in tcp_input()


From: Andrew
Subject: [lwip-devel] [bug #56531] Missing locking in tcp_input()
Date: Thu, 20 Jun 2019 17:28:58 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

URL:
  <https://savannah.nongnu.org/bugs/?56531>

                 Summary: Missing locking in tcp_input()
                 Project: lwIP - A Lightweight TCP/IP stack
            Submitted by: andrew_parlane
            Submitted on: Thu 20 Jun 2019 09:28:56 PM UTC
                Category: TCP
                Severity: 3 - Normal
              Item Group: Crash Error
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None
            lwIP version: Other

    _______________________________________________________

Details:

I've been debugging an issue in our application where if the devices is
sending TCP packets and at the same time receiving lots of UDP packets (I
think this is due to the number of interrupts more than anything else), then
LWIP crashes with an assert.

The assert it hits is in lwip_netconn_do_writemore(). Specifically:

LWIP_ASSERT("conn->current_msg->msg.w.offset < conn->current_msg->msg.w.len",
              conn->current_msg->msg.w.offset <
conn->current_msg->msg.w.len);
  
At this point msg.len == msg.offset == 512.

The call stack at that time is from my ethernetif rx thread:

        7 lwip_netconn_do_writemore() api_msg.c:1657 0x00009b14 
        6 sent_tcp() api_msg.c:407 0x00007dd8   
        5 tcp_input() tcp_in.c:468 0x0002481c   
        4 ip4_input() ip4.c:709 0x00019914      
        3 ethernet_input() ethernet.c:186 0x0002c580    
        2 ethernetif_input() altera_tse_ethernetif.c:214 0x0002ca44     
        1 <symbol is not available> 0xa5a5a5a5  

It appears to be an ACK to a TCP message that was sent.

After adding some debug messages in, I find that on calling lwip_send() (with
a size of 512 bytes) from the main thread, we end up in
lwip_netconn_do_write() where msg->conn->current_msg is set. We then end up in
lwip_netconn_do_writemore() where tcp_write() is called with the full packet.
Offset is updated:

conn->current_msg->msg.w.offset += len;

Now msg.w.offset is 512, and msg.w.len is 512. Because we wrote everything
write_more is set to 0, and we exit the loop. Further down we check if
msg.w.offset == msg.w.len and if so, we set write_finished, and then clear
conn->current_msg.

The call stack at this point is:

        11 lwip_netconn_do_writemore() api_msg.c:1756 0x00009f60        
        10 lwip_netconn_do_write() api_msg.c:1844 0x0000a1bc    
        9 tcpip_send_msg_wait_sem() tcpip.c:442 0x00011010      
        8 netconn_apimsg() api_lib.c:131 0x0000630c     
        7 netconn_write_vectors_partly() api_lib.c:1064 0x000074ec      
        6 netconn_write_partly() api_lib.c:980 0x0000733c       
        5 lwip_send() sockets.c:1412 0x0000cf30 

However, a context switch occurs between updating msg.w.offset, and clearing
conn->current_msg. If it switches to the ethernetif_rx thread and that
receives a TCP ACK, we end up back in lwip_netconn_do_writemore(), but at this
point we fail the assert, because msg.w.offset == msg.w.len.

In the second callstack (from lwip_send()) we end up in
tcpip_send_msg_wait_sem() which has the following code:

#if LWIP_TCPIP_CORE_LOCKING
  LWIP_UNUSED_ARG(sem);
  LOCK_TCPIP_CORE();
  fn(apimsg);
  UNLOCK_TCPIP_CORE();
  return ERR_OK;
#else /* LWIP_TCPIP_CORE_LOCKING */
  ...

So it locks the TCPIP core.

However nothing locks the tpcip core on the other path (in the ethernetif_rx)
thread.

If in ip4.c I lock the TCPIP core around the call to tcp_input():

    case IP_PROTO_TCP:
        MIB2_STATS_INC(mib2.ipindelivers);
        LOCK_TCPIP_CORE();
        tcp_input(p, inp);
        UNLOCK_TCPIP_CORE();
        break;

Then the assert no longer occurs and everything seems happy.

However that's probably way to coarse a lock.

I tried adding the locking to tcp_in.c around TCP_EVENT_SENT(pcb,
(u16_t)acked16, err); which fixed that assert, but another assert occurred
elswhere, specifically:

LWIP_ASSERT("tcp_write: no pbufs on queue => both queues empty",
                pcb->unacked == NULL && pcb->unsent == NULL);

in tcp_write_checks().

I think the fix should be relatively simple, but I'm not sure of the best
place to put the locking.

Info:
    Board: custom
    Processor: NIOSII softcore processor in a CycloneV FPGA.
    ETH MAC: Custom but based around Altera's TSE MAC.
    LWIP version: 2.1.2
    OS: FreeRTOS v10.1.1

See attached file for lwipopt.h

Thanks,
Andrew



    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Thu 20 Jun 2019 09:28:56 PM UTC  Name: lwipopts.h  Size: 9KiB   By:
andrew_parlane

<http://savannah.nongnu.org/bugs/download.php?file_id=47113>

    _______________________________________________________

Reply to this item at:

  <https://savannah.nongnu.org/bugs/?56531>

_______________________________________________
  Message sent via Savannah
  https://savannah.nongnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]