bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: waiting for process substitutions


From: Robert Elz
Subject: Re: waiting for process substitutions
Date: Sat, 13 Jul 2024 07:40:42 +0700

    Date:        Fri, 12 Jul 2024 11:48:15 -0400
    From:        Chet Ramey <chet.ramey@case.edu>
    Message-ID:  <258bcd3a-a936-4751-8e24-916fbeb9c739@case.edu>


  | Not really, since the original intent was to wait for the *next* process
  | to terminate.

There are two issues with that.   The first is "next after what", one
interpretation would be "the next after the last which was waited upon"
(one way or another).   The other, and the one you seem to imply, is
"next which terminates after now" - ie: still running when the wait
command is executed.   But that's an obvious race condition, that's
the second issue, as there is no possible way to know (in the script
which is executing "wait -n") which processes have terminated at that
instant.

Eg: let's assume I have two running bg jobs, one which is going to take
a very long time, the other which will finish fairly soon.

For this e-mail, I'll emulate those two with just "sleep", though one
of them might be a rebuild of firefox, and all its dependencies, from
sources (yes, including rust), which will take some time, and the other
is a rebuild of "true" (/bin/true not the builtin), which probably won't,
as an empty executable file is all that's required.

So, and assuming an implementation of sleep which accepts fractional
seconds:

        sleep $(( 5 * 24 * 60 * 60 )) & J1=$!
        sleep 0.01 & J2=$!

        printf 'Just so the shell is doing something: jobs are %s & %s\n' \
                "${J1}" "${J2}"

        wait -n

Now which of the two background jobs is that waiting for?  Which do you
expect the script writer intended to wait for?   You can make the 2nd
sleep be "sleep 0" if you want to do a more reasonable test, just make
sure when you test, to get a valid result, you don't interrupt that wait.

The current implementation is lunacy, cannot possibly have any users,
since without doing a wait the script cannot possibly know what has
finished already, so can't possibly be explicitly excluding jobs which
just happen to have finished after the last "wait -n" (or other wait).

Of course, in the above simple example, the wait -n could be replaced
by wait "${J2}" which would work just fine, but a real example would
probably have many running jobs, some of which are very quick, and
others which aren't, and some arbitrary ones of the quick ones might
be so quick that they are finished before the script is ready to wait.
Even a firefox build might be that quick, if the options passed to
the top level make happen to contain a make syntax error, and so all
that happens is an error message (Usage:...) and very quick exit.

Please just change this, use the first definition of "next job to
finish" - and in the case when there are already several of them,
pick one, any one - you could order them by the time that bash reaped
the jobs internally, but there's no real reason to do so, as that
isn't necessarily the order the actual processes terminated, just
the order the kernel picked to answer the wait() sys call, when
there are several child zombies ready to be reaped.


  | > Bash is already tracking the pids for all child processes not waited
  | > on, internally. So I imagine it wouldn't be too much work to make that
  | > information available to the script it's running.
  |
  | So an additional feature request.

If it helps, to perhaps provide some consistency, the NetBSD shell has
a builtin:

   jobid [-g|-j|-p] [job]
         With no flags, print the process identifiers of the processes in
         the job.

(-g instead gives the process group, -j the job identifier (%n), and
-p the lead pid (that which was $! when the job was started, which might
also be the process group, but also might not be).   The "job" arg (which
defaults to '%%') can identify the job by any of the methods that wait,
or kill, or "fg" (etc) allow, that us %% %- %+ %string or a pid ($!)).
Just one "job" arg, and only one option allowed, so there's no temptation
(nor requirement) to attempt to write sh code to parse the output and
work out what is what.  It's a builtin, running it multiple times is
cheaper than any parse attempt could possibly be.

            jobid exits with status 2 if there is an argument error, status 1,
            if with -g the job had no separate process group, or with -p there
            is no process group leader (should not happen), and otherwise
            exits with status 0.

("argument error" includes both things like giving 2 options, or an
invalid (unknown) one, or giving a job arg that doesn't resolve to a
current (running, stopped, or terminated but unwaited) job.   Job
control needs to be enabled (rare in scripts) to get separate process
groups.   The "process group leader" is just $! - has no particular
relationship with actual process groups (and yes, the wording could be better).

That command can be run after each job is created, using $! as the job
arg, and saving the pids, and/or job number (for later execution when
needed) however the script likes,

Much the same info is also available using jobs -l, but that is hard to
parse, and has the side effect of also waiting on any jobs which happen
to have already terminated, requiring even more parsing to extract the
status, so isn't a practical solution.

In a shell with job control enabled:

sleep 3 | sleep 4 | sleep 5 & echo '$! is' $!; \
        jobid $!; jobid -g $!; jobid -p $!; jobid -j $!
$! is 6862
816 8140 6862
816
6862
%1

And yes, it works inside a sub-shell, including command substitutions,
until some other job (foreground or background, but not builtin) is
started in that subshell environment, so something like

        eval $( printf 'pids="'; jobid $!; printf '"; job='; jobid -j $1 )

works:

sleep 3 | sleep 4 | sleep 5 &
eval $( printf 'pids="'; jobid $!; printf '"; job='; jobid -j $1 )
echo "PID=$! JOB=${job} PIDS='${pids}'"
PID=8541 JOB=%2 PIDS='18420 27242 8541'

kre




reply via email to

[Prev in Thread] Current Thread [Next in Thread]