[lwip-devel] trying to make closing-from-different-task-while-in-accept/
From:
Fabian Koch
Subject:
[lwip-devel] trying to make closing-from-different-task-while-in-accept/select/connect possible
Date:
Tue, 1 Apr 2014 18:36:36 +0200
Hey all,
disclaimer: I know that LwIP currently
does not support what we want to achieve here. So if you want to answer
"LwIP doesn't support this", save yourself the time. Still, I
hope I can get some ideas and feedback from you guys about this because
we really need it.
We have got a version running now that
kind of "works". We are not sure yet, whether we have thought
of everything and that is part of the reason I write this mail.
1) in the sys_arch layer, we had a very
paranoid implementation that we had to losen a bit to achieve the current
working behavior:
* sys_arch_mbox_fetch() and sys_arch_sem_wait()
do return with SYS_ARCH_TIMEOUT when their specific resources were deleted
while waiting on. (Should be a different return value to be able to react
differently but works for now)
2) inside lwip_select(), where the socket->select_waiting
counter is incremented, there is an assertion that fires, when the socket
is suddenly gone (scenario: close() from one task while still in select()
in other task).
We changed this like this:
- LWIP_ASSERT("sock
!= NULL", sock != NULL);
+ if(sock
== NULL) {
+ continue;
//socket has been closed while we were
in the select(). Continue the loop to get out of the select() cleanly
+ }
for both for-loops where select_waiting
is supposed to be increased.
Also not a clean solution/reaction but
at least select() returns.
3) we encountered a problem when two
tasks both did a lwip_close() on the same socket in a very short order
of time.
When the close() is called the second
time, while the first is still in progress, free_socket() was not called
yet, so the protections in netconn_delete() and do_delconn() do not work
and the close is executed a second time.
When it hits netconn_free(), the sys_sem_free()
works on the already invalid semaphore and the memp_free() on the already
invalid conn.
To avoid this we made the following
change in netconn_free():
4) in netconn_accept() we added an initialization
value for the newconn pointer. Not only because it is better style to have
a knwon value there, but also to cope with the fact from point 1) where
we return from sys_arch_mbox_fetch() with SYS_ARCH_TIMEOUT when the socket
was closed while we stuck in accept().
5) in the general tcpip_apimsg() call,
we introduced a protection against returning possibly wrong (e.g. ERR_OK)
error codes after returning from sys_arch_sem_wait() when it was closed:
@@ -316,6 +316,10 @@
msg.msg.apimsg =
apimsg;
sys_mbox_post(&mbox,
&msg);
sys_arch_sem_wait(&apimsg->msg.conn->op_completed,
0);
+ /* do a quick check if
the connection is still valid after we return from sem_wait */
+ if(apimsg->msg.conn
== NULL){
+ return ERR_VAL;
+ }
return apimsg->msg.err;
}
return ERR_VAL;
we could have checked for SYS_ARCH_TIMEOUT
here, but I feel that this hack has to end in the future and sys_arch_sem_wait()
and _mbox_fetch() should return something else when their resources have
been deleted...
6) We are still looking for a good solution
for the problem of closing a socket that is currently blocking in connect().
Of course there are several LWIP_ASSERT() in do_delconn() but even ignoring
those, the netconn_drain() call is a problem with our sys_arch layer and
it gets pretty ugly from there.
I'll keep looking into that roadblock
tomorrow (there is an @todo
TCP: abort running write/connect?
comment in there, so...)
7) Doing write and read from different
tasks seems to work for now. As long as there are only one tx and one rx
task
I would love any notes, hints, help,
comments, anything!
kind regards,
Fabian
[Prev in Thread]
Current Thread
[Next in Thread]
[lwip-devel] trying to make closing-from-different-task-while-in-accept/select/connect possible,
Fabian Koch<=