[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk 5.2.2 fatal crash when closing a two-way pipe for a process tha
From: |
Andrew J. Schorr |
Subject: |
Re: gawk 5.2.2 fatal crash when closing a two-way pipe for a process that does not have a pid anymore |
Date: |
Thu, 24 Aug 2023 11:44:26 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
Hmmm. In my test reproducer, I'm writing to a dead pipe, and that probably
should be a fatal error. But I see your point -- when I try on 5.1.1, it's
not giving a fatal error. It looks like this logic was added in
patch 9eb357e00, which was committed after the release of 5.1.1:
2021-11-30 Andrew J. Schorr <aschorr@telemetry-investments.com>
Improve output redirection error handling for problems not detected
until the final flush or close. Thanks to Miguel Pineiro Jr.
<mpj@pineiro.cc> for the bug report and suggesting a fix.
* awk.h (efflush): Add declaration.
* builtin.c (efwrite): Break up into 3 functions by moving the
flushing logic into efflush and the error handling logic into
wrerror.
(wrerror): New function containing the error-handling logic extracted
from efwrite.
(efflush): New function containing the fflush logic extracted from
efwrite.
* io.c (close_redir): Call efflush prior to closing the redirection
to identify any problems with flushing output and to take advantage
of the error-handling logic used for print and printf.
There's a discussion thread here:
https://lists.gnu.org/archive/html/bug-gawk/2021-11/msg00022.html
In your actual usage where the crash is occurring, are you writing to
the process after it has gone away? If so, I think NONFATAL should
reasonably be required.
So yes, it does seem to be an incompatible change, but it's fixing a bug
where I/O was silently failing.
Regards,
Andy
On Thu, Aug 24, 2023 at 03:28:36PM +0000, Finn Magnusson wrote:
> Hi
> Many thanks for the clarification.
> Well done on the reproducer : -)
> The reason why I thought it is a bug is due to the difference in behaviour
> compared to gawk 5.1.1 , where this scenario does not trigger a crash even if
> "NONFATAL" is not set.
> If not a bug, then is this a non-backward compatible change?
> Many thanks.
> BR
> Finn
>
> On Thursday, August 24, 2023 at 04:55:24 PM GMT+2, Andrew J. Schorr
> <aschorr@telemetry-investments.com> wrote:
>
>
> Hi,
>
> I made a reproducer. It's not so hard. :-)
>
> Using the master branch:
>
> bash-4.2$ cat /tmp/bug.gawk
> BEGIN {
> cmd = "ssh `hostname` uptime"
> print "hello" |& cmd
> system("ps -ef | grep ssh")
> print "sleeping while waiting for ssh to exit"
> sleep(1)
> print "another write after process is gone" |& cmd
> system("ps -ef | grep ssh")
> print "closing now"
> close(cmd)
> }
>
> bash-4.2$ ./gawk -l extension/.libs/time.so -f /tmp/bug.gawk
> schorr 3684 1 0 2021 ? 00:00:00 ssh-agent
> root 19396 26130 0 10:34 ? 00:00:00 sshd: schorr [priv]
> schorr 19399 19396 0 10:34 ? 00:00:01 sshd: schorr
> schorr 22087 22086 0 10:48 pts/9 00:00:00 ssh ti139 uptime
> schorr 22088 22086 0 10:48 pts/9 00:00:00 sh -c ps -ef | grep ssh
> schorr 22091 22088 0 10:48 pts/9 00:00:00 grep ssh
> root 24832 26130 0 Apr14 ? 00:00:00 sshd: schorr [priv]
> schorr 24834 24832 0 Apr14 ? 00:00:05 [sshd] <defunct>
> root 26130 1 0 2021 ? 00:00:37 /usr/sbin/sshd -D
> sleeping while waiting for ssh to exit
> schorr 3684 1 0 2021 ? 00:00:00 ssh-agent
> root 19396 26130 0 10:34 ? 00:00:00 sshd: schorr [priv]
> schorr 19399 19396 0 10:34 ? 00:00:01 sshd: schorr
> schorr 22087 22086 1 10:48 pts/9 00:00:00 [ssh] <defunct>
> schorr 22121 22086 0 10:48 pts/9 00:00:00 sh -c ps -ef | grep ssh
> schorr 22123 22121 0 10:48 pts/9 00:00:00 grep ssh
> root 24832 26130 0 Apr14 ? 00:00:00 sshd: schorr [priv]
> schorr 24834 24832 0 Apr14 ? 00:00:05 [sshd] <defunct>
> root 26130 1 0 2021 ? 00:00:37 /usr/sbin/sshd -D
> closing now
> gawk: /tmp/bug.gawk:10: fatal: flush to "ssh `hostname` uptime" failed: reason
> unknown
>
> The defunct process (pid 22087) doesn't seem to be relevant. As you noted,
> gawk gives a fatal error.
>
> I'm not sure that this is actually a bug. Have you considered using
> non-fatal I/O?
>
> If I add 'PROCINFO["NONFATAL"] = 1' to the script, it no longer gives
> a fatal error. Or limit it to the command in question:
>
> BEGIN {
> cmd = "ssh `hostname` uptime"
> PROCINFO[cmd, "NONFATAL"] = 1
> print "hello" |& cmd
> system("ps -ef | grep ssh")
> print "sleeping while waiting for ssh to exit"
> sleep(1)
> print "another write after process is gone" |& cmd
> system("ps -ef | grep ssh")
> print "closing now"
> close(cmd)
> }
>
> Why do you think it's a bug?
>
> Regards,
> Andy
>
> On Thu, Aug 24, 2023 at 01:31:24PM +0000, Finn Magnusson wrote:
> > Hi
> > I wish I could manage to reproduce the issue with a simple recipe.
> > But whichever way I try to close the process associated with the two-way
> pipe,
> > it stays as a defunct process and then the issue does not occur.
> > The only way I get the issue is in my program where I start a two-way pipe
> > toward a ssh client which opens to a netconf session on a remote machine. On
> > the remote machine, I issue a command to close the netconf session. This
> causes
> > the ssh client to close down completely on my machine and no defunct process
> > remains. Then when using the close() function in gawk to close the two-way
> pipe
> > it crashes because the ssh client process does not exist anymore, not even
> > as
> a
> > defunct process.
> > So that is not so easy to reproduce outside of my environment since the
> netconf
> > server that I use is a proprietary system here at the company where I work.
> > In case you make a fix I can always try it in my environment and let you
> > know
> > whether it solved the issue.
> > If that is not satisfactory then feel free to discard this bug report until
> > I
> > found a way to reproduce it that could be done in any environment.
> > Many thanks.
> > BR
> > Finn
> >
> > On Thursday, August 24, 2023 at 03:06:37 PM GMT+2, Andrew J. Schorr
> > <aschorr@telemetry-investments.com> wrote:
> >
> >
> > Hi,
> >
> > Thanks for the bug report. Can you please provide a simple recipe
> > for how to reproduce this problem?
> >
> > Thanks,
> > Andy
> >
> > On Thu, Aug 24, 2023 at 09:56:54AM +0000, Finn Magnusson via Bug reports
> > only
> > for gawk. wrote:
> > > Dear gawk developers
> > > I noticed the below issue in gawk 5.2.2 which was not present in previous
> > gawk version I was using (5.1.1): when using the close() function to close a
> > two-way pipe to a process that does not have a PID anymore (e.g. due to the
> > process got closed by an external command), then I got the below fatal
> > crash:
> > > gawk.lin64: /app/moshell/23.2h/moshell/prog.awk:19919: fatal: flush to "/
> app/
> > moshell/23.2h/moshell/commonjars/ssh.lin64 -p 2022 -z '/proj/wcdma-userarea/
> > users/eanzmagn/moshell_logfiles/logs_moshell/tempfiles/20230824-114538_6552/
> > sshz6592' -l expert -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/
> null
> > -o HostKeyAlgorithms="ssh-dss,ssh-rsa,rsa-sha2-512,rsa-sha2-256" -o
> > NumberOfPasswordPrompts=1 -o ConnectTimeout=10 -o ServerAliveInterval=300 -o
> > ConnectionAttempts=1 -o ServerAliveCountMax=0 -o TCPKeepAlive=no -o
> > PreferredAuthentications=publickey,password 10.136.72.120 -s netconf 2>&1"
> > failed: reason unknown
> > >
> > > I was able to solve it by commenting out the below efflush statement in
> > gawk-5.2.2/io.c : /* flush before closing to leverage special error
> > handling
> *
> > / efflush(rp->output.fp, "flush", rp);
> > > Is it possible to make a fix for this in a coming gawk release?
> > > Many thanks.BRFinn
--
Andrew Schorr e-mail: aschorr@telemetry-investments.com
Telemetry Investments, L.L.C. phone: 917-305-1748
152 W 36th St, #402 fax: 212-425-5550
New York, NY 10018-8765