bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use of '()' in a regexp


From: arnold
Subject: Re: Use of '()' in a regexp
Date: Thu, 07 Jan 2021 07:07:46 -0700
User-agent: Heirloom mailx 12.5 7/5/10

The answer is "no".  Record separators must be non-null; the only exception
where RT will be "" is at the end of a file.

This is also how Brian Kernighan's awk handles RS as a regexp.

Thanks,

Arnold

Ed Morton <mortoneccc@comcast.net> wrote:

> In case that's not an adequate example, what I mean is, will this:
>
> $ printf 'foo\nbar\n' | awk -v RS='()' -v ORS='X' '1' file
>
> then produce the same output as this:
>
> $ printf 'foo\nbar\n' | awk -v RS='^$' '{gsub(/()/,"X")}1'
> XfXoXoX
> XbXaXrX
> X
>
> or not and, if not, why is it different?
>
> I just noticed that this seems to handle `/()/` differently from either 
> of the current cases again:
>
> $ printf 'foo\nbar\n' | awk '{nf=split($0,flds,/()/,seps); print nf; for 
> (i=0; i<=nf; i++) printf "%s%s", flds[i], "<"seps[i]">" ; print ""}'
> 1
> <>foo<>
> 1
> <>bar<>
>
> Regards,
>
>      Ed.
>
> On 1/6/2021 2:54 PM, Ed Morton wrote:
> > Great! Will that treat `()` when used in an RS:
> >
> >     awk -v RS='()' -v ORS='x' '1'
> >
> > the same as it's treated in a regexp in other contexts such as with 
> > gsub():
> >
> >     awk -v ORS= '{gsub(/()/,"x")} 1'
> >
> > or does it mean something different when used in an RS?
> >
> >     Ed.
> >
> > On 1/6/2021 1:33 PM, arnold@skeeve.com wrote:
> >> Hi. Re this:
> >>
> >> Ed Morton<mortoneccc@comcast.net>  wrote:
> >>
> >>> Someone just pointed this out to me (gawk 5.1.0):
> >>>
> >>> $ printf 'foo\n' | awk '{gsub(/()/,"x")} 1'
> >>> xfxoxox
> >>>
> >>> $ printf 'foo\n' | awk -v RS='()' -v ORS='x\n' '1'
> >>> foox
> >>>
> >>> Obviously that's a pretty ridiculous regexp but it still has me
> >>> wondering - why does `gsub()` treat the regexp `()` as matching a null
> >>> string around every character while `RS` treats it as if I'd asked it to
> >>> match the `\n` at the end of the input:
> >>>
> >>> $ printf 'foo\n' | awk -v RS='\n$' -v ORS='x\n' '1'
> >>> foox
> >>>
> >>> I could just file this under "don't write stupid regexps" but I was
> >>> wondering if there's a more concrete, satisfying explanation of the
> >>> behavior.
> >>>
> >>>       Ed.
> >> It's a bug. This appears to be the fix. It doesn't break the
> >> test suite, either.
> >>
> >> Thanks for the report!
> >>
> >> Arnold
> >> -----------------------------------------
> >> diff --git a/io.c b/io.c
> >> index 2714398e..0af8ab1e 100644
> >> --- a/io.c
> >> +++ b/io.c
> >> @@ -3702,7 +3702,7 @@ again:
> >>             * If still room in buffer, skip over null match
> >>             * and restart search. Otherwise, return.
> >>             */
> >> -          if (bp + iop->scanoff < iop->dataend) {
> >> +          if (bp + iop->scanoff <= iop->dataend) {
> >>                    bp += iop->scanoff;
> >>                    goto again;
> >>            }
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]