bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'wait -n' with and without id arguments


From: Zachary Santer
Subject: Re: 'wait -n' with and without id arguments
Date: Sun, 29 Sep 2024 12:55:09 -0400

CWRU/CWRU.chlog:
>    9/25
>    ----
> jobs.c
> - wait_for_any_job: if the jobs table is empty and there are no
>   eligible procsubs, and the shell is in posix mode, take a random
>   pid from the bgpids table, delete it, and return its status
>   (since we would be deleting that pid from bgpids anyway)

This is a really strange thing to implement. If a background job has
already terminated and been added to the bgpids table (due to a
notification), *don't report it through 'wait -n' unless there's
nothing else that could be reported.* So potentially 'wait -n' is
waiting for a background job that's still executing when there's an
already-terminated background job that 'wait -n' would report right
then, had it not been notified.

Consider Greg Wooledge's example of trying to run five (or however
many) background processes concurrently, until a list of however many
necessary background processes has been exhausted. If 'wait -n'
doesn't prioritize returning the termination status of a background
job that has already terminated and been notified, then that slot -
one of the five - is still left vacant during the remainder of
execution. The termination status of that background job is only
reported through 'wait -n' after the rest of the list of necessary
background processes has been exhausted. You're still at risk of
losing slots until you're only running one background process at a
time.

devel branch commit 254081c097:

$ source ~/random/wait-n-failure

$ wait-n-failure::main
Reliable. Monotonically-increasing job ids.

$ wait-n-failure::main explicit_pids
Reliable. Monotonically-increasing job ids.

$ wait-n-failure::main monitor
Loses four or five processes each time. Assigned job ids are all 1
after a while, indicating that only one background job is running at a
time.

$ wait-n-failure::main monitor notify
Really bad.

$ wait-n-failure::main monitor posix
Reliable. Monotonically-increasing job ids.

$ wait-n-failure::main monitor notify posix
355 processes waited / 100 processes forked
What?
100 processes waited / 100 processes forked
But you can see that the only job id being assigned is 1 after a
little while. Only one background job is running at a time at that
point.
99 processes waited / 100 processes forked
Huh.
101 processes waited / 100 processes forked
A whole lot of 99, 100, or 101 out of 100. I don't know what's going
on there. Maybe some - potentially unavoidable? - other race
condition. In any case, the job ids always tell the same story: only
one background job is running at a time, after a little while.

$ wait-n-failure::main explicit_pids monitor notify posix
Got one run where it waited for one process and called it quits. No
error message from 'sleep', so who knows? Got a 104 / 100 at one
point. 100s are more prevalent than in the last case. Job ids are able
to creep up from 1 throughout execution, but they don't increase
monotonically.

On Wed, Sep 25, 2024 at 11:06 AM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 9/8/24 8:35 PM, Zachary Santer wrote:
>
> This is still a discussion about interactive shell behavior only.

I might argue that calling 'jobs' within a script being executed
normally shouldn't make background jobs that have already terminated
unavailable to 'wait -n' either. Though calling 'jobs' within a script
executing normally might be done with the purpose of clearing the jobs
table, even knowing that it does that would take a pretty deep
understanding of what bash does internally. So this is probably a more
narrow corner case.

> >> What behavior do you want from the command lists that differs from what I
> >> described above? Since shell functions are essentially lists, you should
> >> get the same behavior from both.
> >
> > You'd have to restrict job status notification to only ever occur
> > immediately prior to a prompt, in both posix and default mode, and
> > then you'd still need a blurb in the BUGS section of the manual saying
> > that 'set -b' has a potentially surprising impact on 'wait -n' in the
> > interactive shell.
>
> This is pretty much posix mode,

I'm considering using posix mode all the time, just to see if it makes
my life easier. Not that I know what it does, outside of this.

> and there's no reason to list this as a bug when it's not.

Any behavior that's unexpected, given a full understanding of the
manual. If it's not a bug, okay, but that's not a very meaningful
distinction at that point.

> > For this to work, you'd have to choose one of the following new behaviors:
> >
> > 1) Background jobs that are both forked and cleared from the jobs
> > table by a call to 'wait' in the time between an accept-line and the
> > following prompt would never receive a job id or be notified to the
> > user in any way.
>
> That's not how job control works. Jobs are created and job numbers assigned
> when the background process is created.

I was thinking the job id doesn't do the user any good if it's for a
job that they don't have the opportunity to act upon. It's come and
gone by the time they've got a command line again.

But then, job ids can be referenced within the running code. Just not
something I've had a reason to do.

> > 2) Job ids assigned to background jobs continue to increase
> > monotonically, between accept-line and prompt, even as some of those
> > jobs are removed from the jobs table by calls to 'wait'.
>
> No shell works this way, and there's not a good reason to adopt it.

I might be missing something, but bash sure seems to be doing this in
a number of different calls to wait-n-failure::main, on the current
devel branch commit. Are the jobs not being removed from the jobs
table until some later point?

It feels like I've confused myself by this point. I was considering
what bash would have to do, to not display job notifications at any
point except immediately prior to displaying the following prompt.
Bash is, of course, displaying job status notifications as jobs are
forked and as they terminate. That *is* prior to the following prompt,
but not *immediately* so. So the behavior I was trying to describe
would be a departure from the current posix mode behavior, but it
clearly isn't necessary.

> > If the 'jobs' builtin is called in the midst of a command list being
> > run with either behavior, this would cause the same updates to the
> > jobs table and list of saved pids and statuses as would occur
> > immediately prior to a prompt.
>
> So you are saying that prompt notifications and `jobs' have the same
> effect. POSIX implies but does not require this, and there is differing
> behavior among current implementatations.

I've got no opinion on this point, actually.

> > The user would have to know that
> > calling the 'jobs' builtin would have an impact on what processes
> > 'wait -n' without id arguments will return the termination status of.
> > That would have to be documented in the man page.
>
> This is posix mode.

How does the user know that?

> >>> However, I think the benefit to
> >>> consistent behavior far outweighs the hardship caused to whoever would
> >>> write a script intended for use within the interactive shell that depends
> >>> on 'wait -n' without id arguments ignoring background processes that the
> >>> user has already been notified of via the 'jobs' output.
> >>
> >> Consider programmable completion frameworks, commands executed via
> >> `bind -x', or traps (e.g., DEBUG) intended to provide enhancements to
> >> the standard behavior, all of which exist and have generated reports or
> >> requests for features.
> >>
> >> People put pretty complicated stuff into PROMPT_COMMAND and other prompts,
> >> too.
> >>
> >> We don't know how the existing uses would be affected by changes until I
> >> make them.
> >
> > Could changes to job status notifications cause issues for these users as 
> > well?
>
> You tell me.

No clue.

> > Generally speaking, I would expect most functions defined in any of
> > the above that call 'wait' or 'wait -n' from an interactive
> > environment to track and use explicit pid arguments, to avoid waiting
> > on other background jobs the user forked themselves.
>
> It's an assumption. It might be valid.
>
> > In that case, the
> > behavior they would see, using 'wait -n', has already changed for the
> > better. The use of 'wait -n' without pid arguments in an interactive
> > environment is more likely to be something that a user just typed on
> > the command line themselves.
>
> Why would a user do this? What's the use case for doing that in an
> interactive shell? Not that it really matters.

Maybe testing out a bit of functionality they're trying to implement elsewhere.

> >>> If the behavior here isn't modified, the man page really should note that
> >>> 'wait -n' without id arguments won't return the termination status of a
> >>> child process that has already been notified through the 'jobs' output.
> >>
> >> That is exactly the behavior posix seems to require (`wait -n' aside, but
> >> see below): once you notify the user, you delete the job and it disappears
> >> forever.
> >
> > Should still be in the man page. Very few shell programmers are
> > reading the POSIX standard.
>
> There is a posix mode section in bash.info.

bash.info:
> This manual is meant as a brief introduction to features found in
> Bash.  The Bash manual page should be used as the definitive reference
> on shell behavior.

And then I never looked at it again. I guess I could go looking for
that section if I need to.

> > Does POSIX provide a rationale for this requirement?
>
> Look at the descriptions of `jobs' and `wait', which say this explicitly.
>
> > I'd be curious to know if the Austin Group
> > people have considered the implications of a feature like 'wait -n'.
>
> Doubtful they considered it explicitly, since the role of a standard is
> to standardize existing implementations.
>
>
> > If you go the route of changing job notification behavior, would that
> > be the end of the list of saved pids and statuses? Maintaining that
> > list is more useful than simply following POSIX to a tee. What would
> > be the benefit to the user of making the termination status of
> > notified jobs unavailable to the 'wait' builtin?
>
> I'm not talking about changing default mode, just making bash conform
> to the new requirements in the latest POSIX revision. These requirements
> happen to align well with your use case.

And 'wait -n' without pid arguments in the interactive shell in
default mode is still kind of broken.

> For instance, with the latest devel branch build:
>
> $ set -o posix
> $ sleep 2 &
> [1] 20565
> $
> [1]+  Done                       sleep 2
> $ wait 20565
> $ echo $?
> 0
> $ wait 20565
> bash: wait: pid 20565 is not a child of this shell
> $ echo $?
> 127
> $
>
> This is what you refer to below.

Yeah, I think that's an improvement. As long as posix mode never makes
that termination status unavailable to the first 'wait' call, because
it was already notified, then posix mode seems like the way to go.

> > 'wait -n' with pid arguments now has access to this list, which is
> > good. It wouldn't be going much further to allow 'wait -n' without pid
> > arguments to act on the list as well.
>
> Well, you'd either have to arrange things so the user doesn't get the same
> pid and status returned multiple times -- by removing it from this list or
> some other mechanism. Since that's what happens in posix mode, it looks
> like posix mode fits your use case here.

And now I know that, but I don't even use 'wait -n' for anything. If
and when I do, it's most likely going to be in a script intended to be
executed normally, which I know already works fine.

The point here was to try to get the behavior of 'wait -n' to be as
consistent as possible, between different execution environments: the
interactive shell, a script being sourced, and a script being executed
normally; along with different set and shopt options. If you won't
consider modifying the behavior of 'wait -n' without id arguments in
default mode, then that's frustrating.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]