parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Alternate termination sequence option --term-seq


From: Rasmus Villemoes
Subject: Re: Alternate termination sequence option --term-seq
Date: Wed, 29 Apr 2015 14:07:01 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

On Wed, Apr 29 2015, Ole Tange <ole@tange.dk> wrote:

> On Tue, Apr 28, 2015 at 8:31 PM, Martin d'Anjou
> <martin.danjou14@gmail.com> wrote:
>> On 15-04-27 05:06 PM, Ole Tange wrote:
>>>
>>> This you have to elaborate. Explain how this --termseq is executed
>>> (with special emphasis on process groups):
>>>
>>> --termseq HUP,2,TERM,10,TERM,20,INT,30,KILL
>
> Thanks for  the elaboration.
>
>> Let's assume that the user calls GNU Parallel and wants two iterations:
>> parallel [options] --termseq HUP,2,TERM,10,TERM,20,INT,30,KILL cmd ::: 1 2
> :
>> So as soon as the cmd processes are gone, GNU Parallel looks at the whole
>> process tree, and sends the termination sequence to all of them.
>
> I can see at least one issue with this: When does GNU Parallel map the tree?
>
> If it does that before the first signal is sent, the risk is that the
> tree changes before the second round. And if a process takes 10
> seconds to shut down, it is pretty likely that it needs to run all
> sorts of commands to shut down - thus changing the tree. It also has
> the (albeit small) risk that a PID will be reused by an innocent
> process, and that this PID will be killed in the second round. This
> risk is biggest on systems that choose random PIDs (e.g. OpenBSD,
> MirOS).
>
> If the mapping is done after the last KILL, then there is no tree to map.
>
> Maybe the best you can do is:
>
> * Map the tree
> * Do first round of killing immediate children
> * Map the branches of every (grand*)children
> * Do a second round of killing of these branches
>
> This still has the risk of killing an innocent PID and its children.
>

Killing (in the sense of sending any signal whatsoever) an
innocent/unrelated PID is completely unacceptable, IMO. On a reasonably
busy system, PID reuse within 10 seconds is far from unlikely. Mapping
the tree even before signalling the immediate children is not enough;
some of the grand^nchildren may vanish in the meantime and their PIDs
reused before one can use the gathered information.

No one but a process' immediate parent can safely use the pid for
signalling - (only) the parent knows whether the child still exists,
since the parent can choose not to wait() on the child (the child may be
a zombie, but sending a signal is still safe). After successful wait(),
the pid must be treated as an arbitrary number with no meaning, much
like a pointer after free().

I think the only way to do this right is for GNU Parallel to make each
immediate child a process group leader (setpgrp 0,0 immediately after
fork). Sending a signal to that process group is then safe, regardless
of how the children have forked and exited. It's easy enough to first
send the signal to the child alone, but I don't know how to go about
checking if the child has actually done its duty and signalled its
descendants, or whether GNU Parallel would have to signal the whole
process group (and if so, with what signal and when).

Do note that one can never clean up all descendants that may have been
spawned: A dance consisting of double fork() and some setpgid/setsid
yoga will create a process which cannot be tied to GNU Parallel or any
of its immediate children. So one has to rely on the children not doing
such things.

Rasmus






reply via email to

[Prev in Thread] Current Thread [Next in Thread]