bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk 5.2.2 fatal crash when closing a two-way pipe for a process tha


From: Andrew J. Schorr
Subject: Re: gawk 5.2.2 fatal crash when closing a two-way pipe for a process that does not have a pid anymore
Date: Thu, 24 Aug 2023 11:44:26 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

Hmmm. In my test reproducer, I'm writing to a dead pipe, and that probably
should be a fatal error. But I see your point -- when I try on 5.1.1, it's
not giving a fatal error. It looks like this logic was added in
patch 9eb357e00, which was committed after the release of 5.1.1:

2021-11-30         Andrew J. Schorr      <aschorr@telemetry-investments.com>

        Improve output redirection error handling for problems not detected
        until the final flush or close. Thanks to Miguel Pineiro Jr.
        <mpj@pineiro.cc> for the bug report and suggesting a fix.

        * awk.h (efflush): Add declaration.
        * builtin.c (efwrite): Break up into 3 functions by moving the
        flushing logic into efflush and the error handling logic into
        wrerror.
        (wrerror): New function containing the error-handling logic extracted
        from efwrite.
        (efflush): New function containing the fflush logic extracted from
        efwrite.
        * io.c (close_redir): Call efflush prior to closing the redirection
        to identify any problems with flushing output and to take advantage
        of the error-handling logic used for print and printf.

There's a discussion thread here:
https://lists.gnu.org/archive/html/bug-gawk/2021-11/msg00022.html

In your actual usage where the crash is occurring, are you writing to
the process after it has gone away? If so, I think NONFATAL should
reasonably be required.

So yes, it does seem to be an incompatible change, but it's fixing a bug
where I/O was silently failing.

Regards,
Andy

On Thu, Aug 24, 2023 at 03:28:36PM +0000, Finn Magnusson wrote:
> Hi
> Many thanks for the clarification. 
> Well done on the reproducer : -)
> The reason why I thought it is a bug is due to the difference in behaviour
> compared to gawk 5.1.1 , where this scenario does not trigger a crash even if
> "NONFATAL" is not set. 
> If not a bug, then is this a non-backward compatible change?
> Many thanks.
> BR
> Finn
> 
> On Thursday, August 24, 2023 at 04:55:24 PM GMT+2, Andrew J. Schorr
> <aschorr@telemetry-investments.com> wrote:
> 
> 
> Hi,
> 
> I made a reproducer. It's not so hard. :-)
> 
> Using the master branch:
> 
> bash-4.2$ cat /tmp/bug.gawk
> BEGIN {
>   cmd = "ssh `hostname` uptime"
>   print "hello" |& cmd
>   system("ps -ef | grep ssh")
>   print "sleeping while waiting for ssh to exit"
>   sleep(1)
>   print "another write after process is gone" |& cmd
>   system("ps -ef | grep ssh")
>   print "closing now"
>   close(cmd)
> }
> 
> bash-4.2$ ./gawk -l extension/.libs/time.so -f /tmp/bug.gawk
> schorr    3684    1  0  2021 ?        00:00:00 ssh-agent
> root    19396 26130  0 10:34 ?        00:00:00 sshd: schorr [priv]
> schorr  19399 19396  0 10:34 ?        00:00:01 sshd: schorr
> schorr  22087 22086  0 10:48 pts/9    00:00:00 ssh ti139 uptime
> schorr  22088 22086  0 10:48 pts/9    00:00:00 sh -c ps -ef | grep ssh
> schorr  22091 22088  0 10:48 pts/9    00:00:00 grep ssh
> root    24832 26130  0 Apr14 ?        00:00:00 sshd: schorr [priv]
> schorr  24834 24832  0 Apr14 ?        00:00:05 [sshd] <defunct>
> root    26130    1  0  2021 ?        00:00:37 /usr/sbin/sshd -D
> sleeping while waiting for ssh to exit
> schorr    3684    1  0  2021 ?        00:00:00 ssh-agent
> root    19396 26130  0 10:34 ?        00:00:00 sshd: schorr [priv]
> schorr  19399 19396  0 10:34 ?        00:00:01 sshd: schorr
> schorr  22087 22086  1 10:48 pts/9    00:00:00 [ssh] <defunct>
> schorr  22121 22086  0 10:48 pts/9    00:00:00 sh -c ps -ef | grep ssh
> schorr  22123 22121  0 10:48 pts/9    00:00:00 grep ssh
> root    24832 26130  0 Apr14 ?        00:00:00 sshd: schorr [priv]
> schorr  24834 24832  0 Apr14 ?        00:00:05 [sshd] <defunct>
> root    26130    1  0  2021 ?        00:00:37 /usr/sbin/sshd -D
> closing now
> gawk: /tmp/bug.gawk:10: fatal: flush to "ssh `hostname` uptime" failed: reason
> unknown
> 
> The defunct process (pid 22087) doesn't seem to be relevant. As you noted,
> gawk gives a fatal error.
> 
> I'm not sure that this is actually a bug. Have you considered using
> non-fatal I/O?
> 
> If I add 'PROCINFO["NONFATAL"] = 1' to the script, it no longer gives
> a fatal error. Or limit it to the command in question:
> 
> BEGIN {
>   cmd = "ssh `hostname` uptime"
>   PROCINFO[cmd, "NONFATAL"] = 1
>   print "hello" |& cmd
>   system("ps -ef | grep ssh")
>   print "sleeping while waiting for ssh to exit"
>   sleep(1)
>   print "another write after process is gone" |& cmd
>   system("ps -ef | grep ssh")
>   print "closing now"
>   close(cmd)
> }
> 
> Why do you think it's a bug?
> 
> Regards,
> Andy
> 
> On Thu, Aug 24, 2023 at 01:31:24PM +0000, Finn Magnusson wrote:
> > Hi
> > I wish I could manage to reproduce the issue with a simple recipe.
> > But whichever way I try to close the process associated with the two-way
> pipe,
> > it stays as a defunct process and then the issue does not occur.
> > The only way I get the issue is in my program where I start a two-way pipe
> > toward a ssh client which opens to a netconf session on a remote machine. On
> > the remote machine, I issue a command to close the netconf session. This
> causes
> > the ssh client to close down completely on my machine and no defunct process
> > remains. Then when using the close() function in gawk to close the two-way
> pipe
> > it crashes because the ssh client process does not exist anymore, not even 
> > as
> a
> > defunct process.
> > So that is not so easy to reproduce outside of my environment since the
> netconf
> > server that I use is a proprietary system here at the company where I work.
> > In case you make a fix I can always try it in my environment and let you 
> > know
> > whether it solved the issue.
> > If that is not satisfactory then feel free to discard this bug report until 
> > I
> > found a way to reproduce it that could be done in any environment.
> > Many thanks.
> > BR
> > Finn
> >
> > On Thursday, August 24, 2023 at 03:06:37 PM GMT+2, Andrew J. Schorr
> > <aschorr@telemetry-investments.com> wrote:
> >
> >
> > Hi,
> >
> > Thanks for the bug report. Can you please provide a simple recipe
> > for how to reproduce this problem?
> >
> > Thanks,
> > Andy
> >
> > On Thu, Aug 24, 2023 at 09:56:54AM +0000, Finn Magnusson via Bug reports 
> > only
> > for gawk. wrote:
> > >  Dear gawk developers
> > > I noticed the below issue in gawk 5.2.2 which was not present in previous
> > gawk version I was using (5.1.1): when using the close() function to close a
> > two-way pipe to a process that does not have a PID anymore (e.g. due to the
> > process got closed by an external command), then I got the below fatal 
> > crash:
> > > gawk.lin64: /app/moshell/23.2h/moshell/prog.awk:19919: fatal: flush to "/
> app/
> > moshell/23.2h/moshell/commonjars/ssh.lin64 -p 2022 -z '/proj/wcdma-userarea/
> > users/eanzmagn/moshell_logfiles/logs_moshell/tempfiles/20230824-114538_6552/
> > sshz6592' -l expert -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/
> null
> > -o HostKeyAlgorithms="ssh-dss,ssh-rsa,rsa-sha2-512,rsa-sha2-256" -o
> > NumberOfPasswordPrompts=1 -o ConnectTimeout=10 -o ServerAliveInterval=300 -o
> > ConnectionAttempts=1 -o ServerAliveCountMax=0 -o TCPKeepAlive=no -o
> > PreferredAuthentications=publickey,password 10.136.72.120 -s netconf 2>&1"
> > failed: reason unknown
> > >
> > > I was able to solve it by commenting out the below efflush statement in
> > gawk-5.2.2/io.c :  /* flush before closing to leverage special error 
> > handling
> *
> > / efflush(rp->output.fp, "flush", rp);
> > > Is it possible to make a fix for this in a coming gawk release?
> > > Many thanks.BRFinn

-- 
Andrew Schorr                      e-mail: aschorr@telemetry-investments.com
Telemetry Investments, L.L.C.      phone:  917-305-1748
152 W 36th St, #402                fax:    212-425-5550
New York, NY 10018-8765



reply via email to

[Prev in Thread] Current Thread [Next in Thread]