guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services


From: Carlo Zancanaro
Subject: [bug#30637] [WIP] shepherd: Poll every 0.5s to find dead forked services
Date: Fri, 02 Mar 2018 09:37:50 +1100
User-agent: mu4e 1.0; emacs 25.3.1

Hey Ludo,

On Wed, Feb 28 2018, Ludovic Courtès wrote:
The problem is that shepherd, when run as a user process, can "lose" services which fork away. Shepherd can still kill them, but a SIGCHLD won't be delivered if they die, so shepherd can't restart/disable them. My prime example is emacs, which I run with --daemon. If I then
kill emacs, shepherd will still think that it is running.

There are two issues here, I think.

1. shepherd cannot lose SIGCHLD: if a process dies immediately once it’s been spawned, as is the case with “emacs --daemon” or any other daemon-style program, it should receive SIGCHLD and process
     it.

Yeah, that's true, but the problem is that shepherd only processes the SIGCHLD if there is a service with its `running` slot set to the pid. When emacs forks, the original process may have its SIGCHLD handled, but that doesn't affect shepherd's service state (as it shouldn't, because it's using #:pid-file to track the forked process).

2. shepherd currently can’t do much with real daemons. So what we do in GuixSD is to either start programs in non-daemon mode, when that’s an option, or pass #:pid-file to retrieve the forked process
     PID.  I think you should do one of these as well.

I am doing that. The problem is that when a service dies (crashes, quits, etc.) the `respawn?` option cannot be honoured because shepherd is not notified that the process has terminated (because it never receives a SIGCHLD for the forked pid). My patch polls for the processes we expect, to make up for the lack of notification. I would much rather it receive an event/signal to notify that the forked process has died, but I don't know how to do that in a robust, portable way so I chose to poll instead.

If you look at my test case in tests/respawn-service.sh (which can be read in its entirety in the diff attached to my previous email) you can see the problem that this patch solves. The test will fail without the rest of my patch, but will pass with them (guix build container issue notwithstanding).

Carlo

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]