emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master 64f2c96 2/2: Make a process test faster.


From: Philipp Stephani
Subject: Re: master 64f2c96 2/2: Make a process test faster.
Date: Sun, 10 Jan 2021 18:17:58 +0100

Am So., 10. Jan. 2021 um 17:39 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:
>
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 9 Jan 2021 22:00:45 +0100
> > Cc: Philipp Stephani <phst@google.com>, Emacs developers 
> > <emacs-devel@gnu.org>
> > 2. The other one happens if SIGCHLD is signaled during
> > wait_reading_process_output. Then wait_reading_process_output will
> > wait forever, since the stdout FD never gets closed and it doesn't see
> > the process status update in time.
>
> Do you mean that wait_reading_process_output has this problem in
> general, or just in this particular scenario?  If the former, I'm
> surprised, as we are using this code for a very long time.  If the
> latter, can you elaborate on the situation, and what does SIGCHLD have
> to do with closing stdout?

I believe is a rather general problem. Anecdotally I've heard
occasionally about problems with accept-process-output deadlocking,
which might be related. See e.g. some of the other tests in
process-tests.el. AIUI, the sequence of events is as follows:
1. (accept-process-output PROC)
2. Here the process isn't finished yet. accept-process-output waits
for something to become available on stdout.
3. If the process doesn't write anything to stdout,
accept-process-output will block.
4. The process exits without having written anything.
5. Stdout is closed.
6. pselect returns, but since the process hasn't written anything,
wait_reading_process_output doesn't return.
7. Emacs receives SIGCHLD.
8. Emacs tries to notify accept-process-output, but it's too late - we
are already within the pselect call, which now hangs forever.
To test that, I added some printf statements, and indeed I saw that
wait_reading_process_output was entered, then SIGCHLD was received,
but wait_reading_process_output continued to hang.
I remedied this on the scratch/sigchild-fd branch by having the
SIGCHLD handler signal a pipe that pselect watches. That fixes the
deadlock entirely for me on GNU/Linux and macOS.
(Going forward we might want to use pidfds if available, they seem
simpler and less error-prone than signals.)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]