emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD


From: Eli Zaretskii
Subject: Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
Date: Wed, 20 Jan 2021 17:05:46 +0200

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 19 Jan 2021 21:22:04 +0100
> Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani <phst@google.com>
> process.c:4729: Faccept_process_output: enter
> process.c:5139: wait_reading_process_output: enter
> process.c:5193: wait_reading_process_output: outer loop
> process.c:5322: wait_reading_process_output: update_tick = 261,
> process_tick = 261
> process.c:5554: wait_reading_process_output: before pselect; max_desc = 1019
> process.c:5601: wait_reading_process_output: after pselect: nfds = -1
> process.c:5641: wait_reading_process_output: EINTR
> process.c:5193: wait_reading_process_output: outer loop
> process.c:5322: wait_reading_process_output: update_tick = 261,
> process_tick = 261
> process.c:7189: handle_child_signal: enter
> process.c:7234: handle_child_signal: process test 5: change status to
> 0; new process_tick = 262
> process.c:144: handle_child_signal: leave
> process.c:5554: wait_reading_process_output: before pselect; max_desc = 1017
> 
> and then Emacs hangs.
> So wait_reading_process_output indeed first receives EINTR, loops
> around, and checks for the process_tick change before SIGCHLD is
> handled. By the time it reruns pselect it's too late.

OK, thanks.  So the scenario is that the SIGCHLD handler is called
_after_ pselect returns with EINTR, and by that time we already call
pselect again, whereas the logic in wait_reading_process_output is
based on the assumption that the signal handler will be called
_before_ pselect returns?  (We should have this in the comments to the
child_signal_* functions, so that we remember why we introduced this
machinery.)

First, is this a valid behavior, or is this a bug in the OS?  I
couldn't find any place where it is documented what Posix says about
the relative timing of calling the handler vs syscall returning with
EINTR -- is the behavior you see allowed?  I do see many places that
explicitly or implicitly assume that a syscall will return with EINTR
_after_ the signal handler returns.

If this _is_ valid behavior, I wonder whether we could somehow amend
the loop and its logic to handle this case without introducing
additional file descriptors to wait.  For example, when we get EINTR,
we could add another pselect call with a short timeout, so that the
handler has time to run and update the process tick before we loop
around.  Or maybe, when we get EINTR, we should loop over the
processes and see which one(s) have exited.  Such changes would
basically leave the overall logic intact, which is IMO better for more
complex use cases that involve other Lisp threads.

Or maybe the problem here is that more than one signal is involved, so
that one of them interrupts pselect before the other (our SIGCHLD)
gets delivered and calls the SIGCHLD handler?

In any case, did you consider what happens with the self-pipe method
when more than one Lisp thread is waiting in pselect?

> > The thing is, on Windows we can only wait on up to 64 handles (unless
> > we complicate the code with multilevel wait, that is), so every
> > unnecessary descriptor we need to wait on means we can support fewer
> > simultaneous subprocesses.  We are already limited to just 32
> > subprocesses, which is quite low a number.
> 
> OK, that's a good point, I didn't know about this limitation. I'll see
> that I can remove the pipe on Windows.

Just making those child_signal_* function be no-ops on MS-Windows, and
#ifedf'ing away that call to FD_SET of child_fd, should be okay, I
think.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]