[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lwip-devel] [bug #56531] Missing locking in tcp_input()
From: |
Andrew |
Subject: |
[lwip-devel] [bug #56531] Missing locking in tcp_input() |
Date: |
Thu, 20 Jun 2019 17:28:58 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0 |
URL:
<https://savannah.nongnu.org/bugs/?56531>
Summary: Missing locking in tcp_input()
Project: lwIP - A Lightweight TCP/IP stack
Submitted by: andrew_parlane
Submitted on: Thu 20 Jun 2019 09:28:56 PM UTC
Category: TCP
Severity: 3 - Normal
Item Group: Crash Error
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None
lwIP version: Other
_______________________________________________________
Details:
I've been debugging an issue in our application where if the devices is
sending TCP packets and at the same time receiving lots of UDP packets (I
think this is due to the number of interrupts more than anything else), then
LWIP crashes with an assert.
The assert it hits is in lwip_netconn_do_writemore(). Specifically:
LWIP_ASSERT("conn->current_msg->msg.w.offset < conn->current_msg->msg.w.len",
conn->current_msg->msg.w.offset <
conn->current_msg->msg.w.len);
At this point msg.len == msg.offset == 512.
The call stack at that time is from my ethernetif rx thread:
7 lwip_netconn_do_writemore() api_msg.c:1657 0x00009b14
6 sent_tcp() api_msg.c:407 0x00007dd8
5 tcp_input() tcp_in.c:468 0x0002481c
4 ip4_input() ip4.c:709 0x00019914
3 ethernet_input() ethernet.c:186 0x0002c580
2 ethernetif_input() altera_tse_ethernetif.c:214 0x0002ca44
1 <symbol is not available> 0xa5a5a5a5
It appears to be an ACK to a TCP message that was sent.
After adding some debug messages in, I find that on calling lwip_send() (with
a size of 512 bytes) from the main thread, we end up in
lwip_netconn_do_write() where msg->conn->current_msg is set. We then end up in
lwip_netconn_do_writemore() where tcp_write() is called with the full packet.
Offset is updated:
conn->current_msg->msg.w.offset += len;
Now msg.w.offset is 512, and msg.w.len is 512. Because we wrote everything
write_more is set to 0, and we exit the loop. Further down we check if
msg.w.offset == msg.w.len and if so, we set write_finished, and then clear
conn->current_msg.
The call stack at this point is:
11 lwip_netconn_do_writemore() api_msg.c:1756 0x00009f60
10 lwip_netconn_do_write() api_msg.c:1844 0x0000a1bc
9 tcpip_send_msg_wait_sem() tcpip.c:442 0x00011010
8 netconn_apimsg() api_lib.c:131 0x0000630c
7 netconn_write_vectors_partly() api_lib.c:1064 0x000074ec
6 netconn_write_partly() api_lib.c:980 0x0000733c
5 lwip_send() sockets.c:1412 0x0000cf30
However, a context switch occurs between updating msg.w.offset, and clearing
conn->current_msg. If it switches to the ethernetif_rx thread and that
receives a TCP ACK, we end up back in lwip_netconn_do_writemore(), but at this
point we fail the assert, because msg.w.offset == msg.w.len.
In the second callstack (from lwip_send()) we end up in
tcpip_send_msg_wait_sem() which has the following code:
#if LWIP_TCPIP_CORE_LOCKING
LWIP_UNUSED_ARG(sem);
LOCK_TCPIP_CORE();
fn(apimsg);
UNLOCK_TCPIP_CORE();
return ERR_OK;
#else /* LWIP_TCPIP_CORE_LOCKING */
...
So it locks the TCPIP core.
However nothing locks the tpcip core on the other path (in the ethernetif_rx)
thread.
If in ip4.c I lock the TCPIP core around the call to tcp_input():
case IP_PROTO_TCP:
MIB2_STATS_INC(mib2.ipindelivers);
LOCK_TCPIP_CORE();
tcp_input(p, inp);
UNLOCK_TCPIP_CORE();
break;
Then the assert no longer occurs and everything seems happy.
However that's probably way to coarse a lock.
I tried adding the locking to tcp_in.c around TCP_EVENT_SENT(pcb,
(u16_t)acked16, err); which fixed that assert, but another assert occurred
elswhere, specifically:
LWIP_ASSERT("tcp_write: no pbufs on queue => both queues empty",
pcb->unacked == NULL && pcb->unsent == NULL);
in tcp_write_checks().
I think the fix should be relatively simple, but I'm not sure of the best
place to put the locking.
Info:
Board: custom
Processor: NIOSII softcore processor in a CycloneV FPGA.
ETH MAC: Custom but based around Altera's TSE MAC.
LWIP version: 2.1.2
OS: FreeRTOS v10.1.1
See attached file for lwipopt.h
Thanks,
Andrew
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Thu 20 Jun 2019 09:28:56 PM UTC Name: lwipopts.h Size: 9KiB By:
andrew_parlane
<http://savannah.nongnu.org/bugs/download.php?file_id=47113>
_______________________________________________________
Reply to this item at:
<https://savannah.nongnu.org/bugs/?56531>
_______________________________________________
Message sent via Savannah
https://savannah.nongnu.org/
- [lwip-devel] [bug #56531] Missing locking in tcp_input(),
Andrew <=