bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17561: Emacs can forget processes


From: Paul Eggert
Subject: bug#17561: Emacs can forget processes
Date: Mon, 26 May 2014 16:58:45 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

Jorgen Schaefer wrote:
the-bug.txt.gz ... shows the strace of a
running Emacs session which starts to exhibit the bug, so we now have a
trace that includes the time when the bug actually happens. It shows me
starting Python subprocesses (elpy RPC processes) and killing them, in
an attempt to reproduce the problem. Eventually, this succeeds, and
Emacs goes into the loop reading from fd 15.

Thanks for that trace; it's helpful. A couple of things. First, what platform are you running on, exactly? I'm running on an AMD Phenom II X4 910e (according to /proc/cpuinfo), and using Fedora 20 x86-64 (Linux kernel 3.14.4-200.fc20.x86_64, glibc 2.18), and compiling with GCC 4.9.0.

Second, looking at the-bug.txt I'm suspicious about the interaction between vfork, pthread_sigmask (aka rt_sigprocmask) and rt_sigreturn. Could you please try building Emacs with fork rather than vfork and see whether that fixes the problem? If it fixes things, great; if not I'd like to see another strace of the bug. To use fork instead of vfork, you can run "./configure ac_cv_func_fork_works=no", or (faster but flimsier) edit src/config.h to add "#define fork vfork".

Here are some details as to why I think vfork is related to the problem.

You see a bug on line 7717 of the trace in the-bug.txt, where read(15, ...) returns 0 and Emacs just keeps trying. I see bugs before that, including:

* On trace line 1043, file descriptor 15 is created, as part of the preparation for vforking and execing subprocess 27012. So far so good.

* On trace line 1048 vfork succeeds.  Still good.

* On trace line 1049 the parent process continues, and executes another system call! The child process has neither execed nor terminated. This is incorrect; if vfork is called, the Linux man page says "the calling thread is suspended until the child terminates (either normally, by calling _exit(2), or abnormally, after delivery of a fatal signal), or it makes a call to execve(2)". I'm reasonably sure that Emacs relies on this and will fail badly if the parent and child can stomp on the same memory in parallel. (Another possibility, alas, is that strace is buggy and its output unreliable....)

* On trace line 1049, the signal mask is cleared except for QUIT, ALRM, PROF.

* On trace line 5598, an INT is delivered just as the INT bit is removed from the signal mask. This strongly suggests that the INT bit was set in the signal mask -- but that bit was cleared on line 1049, so what gives? This suggests that the signal mask is messed up somehow. One possibility is that the buggy vfork is doing it somehow -- perhaps the signal mask is part of the shared memory?

* On trace line 7706, the subprocess exits but no CHLD signal is delivered to Emacs. Again, this suggests that the signal mask is messed up. This last problem most likely leads to the symptoms you're observing: Emacs isn't told that a child died, so it doesn't check for it.

FYI when I try your test, I get something like the following strace output near the vfork. This is the correct vfork behavior, i.e., the parent is suspended until the child execs:

6690 15:51:23.093780 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 <0.000025>
6690  15:51:23.093873 vfork( <unfinished ...>
6692  15:51:23.094078 setsid()          = 6692 <0.000070>
6692 15:51:23.094202 open("/dev/tty", O_RDWR|O_CLOEXEC) = -1 ENXIO (No such device or address) <0.000048>
6692  15:51:23.094310 open("/dev/pts/9", O_RDWR|O_CLOEXEC) = 9 <0.000041>
6692 15:51:23.094409 rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {0x4ea440, [INT QUIT ALRM CHLD PROF WINCH IO], SA_RESTORER, 0x7f9de73d7750}, 8) = 0 <0.000026> 6692 15:51:23.094507 rt_sigaction(SIGQUIT, {SIG_DFL, [QUIT], SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {0x4ea440, [INT QUIT ALRM CHLD PROF WINCH IO], SA_RESTORER, 0x7f9de73d7750}, 8) = 0 <0.000026> 6692 15:51:23.094602 rt_sigaction(SIGPROF, {SIG_DFL, [PROF], SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {SIG_IGN, [PROF], SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, 8) = 0 <0.000025> 6692 15:51:23.094694 rt_sigaction(SIGPIPE, {SIG_DFL, [PIPE], SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, 8) = 0 <0.000025> 6692 15:51:23.094793 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000026> 6692 15:51:23.094877 ioctl(9, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000029> 6692 15:51:23.094963 ioctl(9, SNDCTL_TMR_STOP or SNDRV_TIMER_IOCTL_GINFO or TCSETSW, {B38400 opost isig icanon -echo ...}) = 0 <0.000030> 6692 15:51:23.095080 ioctl(9, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon -echo ...}) = 0 <0.000029>
6692  15:51:23.095170 getpid()          = 6692 <0.000025>
6692 15:51:23.095258 chdir("/home/eggert/src/gnu/emacs/emacs-24-sc") = 0 <0.000038>
6692  15:51:23.095466 dup2(9, 0)        = 0 <0.000028>
6692  15:51:23.095545 dup2(9, 1)        = 1 <0.000027>
6692  15:51:23.095621 dup2(9, 2)        = 2 <0.000025>
6692 15:51:23.095696 setpgid(0, 0) = -1 EPERM (Operation not permitted) <0.000026> 6692 15:51:23.095776 ioctl(0, SNDRV_TIMER_IOCTL_SELECT or TIOCSPGRP, [6692]) = 0 <0.000027> 6692 15:51:23.095872 execve("/home/eggert/opt/Linux-x86_64/merged/bin/cat", ["/home/eggert/opt/Linux-x86_64/me"...], [/* 100 vars */] <unfinished ...>
6690  15:51:23.096361 <... vfork resumed> ) = 6692 <0.002462>
6690 15:51:23.096440 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000028>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]