bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: waiting for process substitutions


From: Zachary Santer
Subject: Re: waiting for process substitutions
Date: Tue, 9 Jul 2024 06:12:07 -0400

On Fri, Jul 5, 2024 at 2:38 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 6/29/24 10:51 PM, Zachary Santer wrote:
>
> so you were then able to wait for each process substitution individually,
> as long as you saved $! after they were created. `wait' without arguments
> would still wait for all process substitutions (procsub_waitall()), but
> the man page continued to guarantee only waiting for the last one.
>
> This was unchanged in bash-5.2. I changed the code to match what the man
> page specified in 10/2022, after
>
> https://lists.gnu.org/archive/html/bug-bash/2022-10/msg00107.html

Is what's being reported there undesirable behavior? It seems like
what one would expect to happen, at least in hindsight, if it's
understood that 'wait' without arguments will in fact wait for all
procsubs. Furthermore, you can work around it pretty trivially:

$ SECONDS=0; ( sleep 2 & wait -- "${!}" ) > >( sleep 5 ); printf
'%s\n' "${SECONDS}"
2

On the other hand, allowing 'wait' without arguments to wait on all
process substitutions would allow my original example to work, in the
case that there aren't other child processes expected to outlive this
pipeline.

command-1 | tee >( command-2 ) >( command-3 ) >( command-4 )
wait

The workaround for this not working would of course be named pipes,
which is somewhat less trivial.

> and that is what is in bash-5.3-alpha.

> > c. If calling wait -n in the middle of all this, whether listing only
> > un-waited-on child process pids or all child process pids, it lists
> > all argument pids as "no such job" and terminates with code 127. This
> > is probably incorrect behavior.
>
> We've discussed this before. `wait -n' waits for the next process to
> terminate; it doesn't look back at processes that have already terminated
> and been added to the list of saved exit statuses. There is code tagged
> for bash-5.4 that allows `wait -n' to look at these exited processes as
> long as it's given an explicit set of pid arguments.

I read through some of that conversation at the time. Seemed like an
obvious goof. Kind of surprised the fix isn't coming to bash 5.3,
honestly.

And why "no such job" instead of "not a child of this shell"?

> So the upshot is that bash should probably manage process substitutions
> even more like other asynchronous processes in that the pid/status pair
> should be saved on the list of saved exit statuses for `wait' to find it,
> and clear that list when `wait' is called without arguments.

Sounds good to me.

On Fri, Jul 5, 2024 at 3:16 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 7/3/24 8:40 PM, Zachary Santer wrote:
> > On Wed, Jul 3, 2024 at 11:21 AM Chet Ramey <chet.ramey@case.edu> wrote:
> >>
> >> Process substitutions are word expansions, with a scope of a single
> >> command, and are not expected to survive their read/write file descriptors
> >> becoming invalid. You shouldn't need to `wait' for them; they're not
> >> true asynchronous processes.
> >
> > They clearly are. The fact that it results from a word expansion is 
> > irrelevant.
>
> They're similar, but they're not jobs. They run in the background, but you
> can't use the same set of job control primitives to manipulate them.
> Their scope is expected to be the lifetime of the command they're a part
> of, not run in the background until they're wanted.

Would there be a downside to making procsubs jobs? The only thing that
comes to mind would be seeing
[2]-  Done                    <( asynchronous command )
when I can often safely assume that the procsub has ended by the time
the command it's feeding input to has. Does the distinction between
job and non-job asynchronous process have implications when job
control is disabled?

> > Consider my original example:
> > command-1 | tee >( command-2 ) >( command-3 ) >( command-4 )
> >
> > Any nontrivial command is going to take more time to run than it took
> > to be fed its input.
>
> In some cases, yes.
>
> > The idea that no process in a process
> > substitution will outlive its input stream precludes a reading process
> > substitution from being useful.
>
> It depends on whether or not it can cope with its input (in this case)
> file descriptor being invalidated. In some cases, yes, in some cases, no.

When you say "invalidated," are you referring to something beyond the
process in a reading process substitution simply receiving EOF?
Everything should be able to handle that much.

> > And nevermind
> > exec {fd}< <( command )
> > I shouldn't do this?
>
> Sure, of course you can. You've committed to managing the file descriptor
> yourself at this point, like any other file descriptor you open with exec.

But then, if I 'exec {fd}<&-' before consuming all of command's
output, I would expect it to receive SIGPIPE and die, if it hasn't
already completed. And I might want to ensure that this child process
has terminated before the calling script exits.

> > Why should these be different in practice?
> >
> > (1)
> > mkfifo named-pipe
> > child process command < named-pipe &
> > {
> >    foreground shell commands
> > } > named-pipe
> >
> > (2)
> > {
> >    foreground shell commands
> > } > >( child process command )
>
> Because you create a job with one and not the other, explicitly allowing
> you to manipulate `child' directly?

Right, but does it have to be that way? What if the asynchronous
processes in process substitutions were jobs?

> > Hypothetically, it could work like this:
> > {
> >    commands
> > } {fd[0]}< <( command-1 )  {fd[1]}< <( command-2 ) {fd[2]}< <( command-3 )
> > But then again, *I can't get the pids for the processes if I do it this 
> > way*.
>
> If you have to get the pids for the individual processes, do it a different
> way. That's just not part of what process substitutions provide: they are
> word expansions that expand to a filename. If the semantics make it more
> convenient for you to use named pipes, then use named pipes.

Being able to switch from background processes with named pipes to
process substitutions once bash was able to wait for them seemed like
a step in the right direction, because it allowed me to lower the line
count of each script without making the scripts any harder to
understand. So generally trying to make process substitutions more
useful/useful in more situations seems like a net positive.

On Mon, Jul 8, 2024 at 4:57 PM Greg Wooledge <greg@wooledge.org> wrote:
>
> On Mon, Jul 08, 2024 at 22:45:35 +0200, alex xmb sw ratchev wrote:
> > On Mon, Jul 8, 2024, 22:15 Chet Ramey <chet.ramey@case.edu> wrote:
> >
> > > On 7/8/24 4:02 PM, alex xmb sw ratchev wrote:
> > >
> > > > hi , one question about ..
> > > > if a cmd contains more substitions like >( or <( , how to get all $!
> > > > maybe make ${![<cmdnum>]} , or is such already .. ?
> > >
> > > You can't. Process substitutions set $!, but you have to have a point
> > > where you can capture that if you want to wait for more than one. That's
> > > the whole purpose of this thread.
> > >
> >
> > so no ${![2]} or so ?
> > else i see only half complex start_first stuff
> >
> > anywa .. greets  = ))
>
> Bash has nothing like that, and as far as I know, nobody is planning to
> add it.
>
> If you need to capture all the PIDs of all your background processes,
> you'll have to launch them one at a time.  This may mean using FIFOs
> (named pipes) instead of anonymous process substitutions, in some cases.

Bash is already tracking the pids for all child processes not waited
on, internally. So I imagine it wouldn't be too much work to make that
information available to the script it's running. Obviously, this is
moving beyond "make the existing features make more sense," but an
array of pids of all child processes not waited on would at least
allow the user to derive pids of what just got forked from a
comparison of that array before and after a command including multiple
procsubs. An array variable like what Alex is suggesting, something
listing all pids resulting from the last pipeline to fork any child
process in the current shell environment, would be a solution to the
matter at hand here.

Maybe a single middle-ground array variable, listing the pids of all
child processes forked (and not waited on) since the last time the
array variable was referenced, would be more easily implemented. You
would just have to save the contents of the array variable in a
variable of your own each time you reference it, if you want to keep
track of that stuff. Not unreasonable, considering that you already
have to do that with $!, at least before each time you fork another
child process.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]