[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Use of '()' in a regexp
From: |
arnold |
Subject: |
Re: Use of '()' in a regexp |
Date: |
Thu, 07 Jan 2021 07:07:46 -0700 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
The answer is "no". Record separators must be non-null; the only exception
where RT will be "" is at the end of a file.
This is also how Brian Kernighan's awk handles RS as a regexp.
Thanks,
Arnold
Ed Morton <mortoneccc@comcast.net> wrote:
> In case that's not an adequate example, what I mean is, will this:
>
> $ printf 'foo\nbar\n' | awk -v RS='()' -v ORS='X' '1' file
>
> then produce the same output as this:
>
> $ printf 'foo\nbar\n' | awk -v RS='^$' '{gsub(/()/,"X")}1'
> XfXoXoX
> XbXaXrX
> X
>
> or not and, if not, why is it different?
>
> I just noticed that this seems to handle `/()/` differently from either
> of the current cases again:
>
> $ printf 'foo\nbar\n' | awk '{nf=split($0,flds,/()/,seps); print nf; for
> (i=0; i<=nf; i++) printf "%s%s", flds[i], "<"seps[i]">" ; print ""}'
> 1
> <>foo<>
> 1
> <>bar<>
>
> Regards,
>
> Ed.
>
> On 1/6/2021 2:54 PM, Ed Morton wrote:
> > Great! Will that treat `()` when used in an RS:
> >
> > awk -v RS='()' -v ORS='x' '1'
> >
> > the same as it's treated in a regexp in other contexts such as with
> > gsub():
> >
> > awk -v ORS= '{gsub(/()/,"x")} 1'
> >
> > or does it mean something different when used in an RS?
> >
> > Ed.
> >
> > On 1/6/2021 1:33 PM, arnold@skeeve.com wrote:
> >> Hi. Re this:
> >>
> >> Ed Morton<mortoneccc@comcast.net> wrote:
> >>
> >>> Someone just pointed this out to me (gawk 5.1.0):
> >>>
> >>> $ printf 'foo\n' | awk '{gsub(/()/,"x")} 1'
> >>> xfxoxox
> >>>
> >>> $ printf 'foo\n' | awk -v RS='()' -v ORS='x\n' '1'
> >>> foox
> >>>
> >>> Obviously that's a pretty ridiculous regexp but it still has me
> >>> wondering - why does `gsub()` treat the regexp `()` as matching a null
> >>> string around every character while `RS` treats it as if I'd asked it to
> >>> match the `\n` at the end of the input:
> >>>
> >>> $ printf 'foo\n' | awk -v RS='\n$' -v ORS='x\n' '1'
> >>> foox
> >>>
> >>> I could just file this under "don't write stupid regexps" but I was
> >>> wondering if there's a more concrete, satisfying explanation of the
> >>> behavior.
> >>>
> >>> Ed.
> >> It's a bug. This appears to be the fix. It doesn't break the
> >> test suite, either.
> >>
> >> Thanks for the report!
> >>
> >> Arnold
> >> -----------------------------------------
> >> diff --git a/io.c b/io.c
> >> index 2714398e..0af8ab1e 100644
> >> --- a/io.c
> >> +++ b/io.c
> >> @@ -3702,7 +3702,7 @@ again:
> >> * If still room in buffer, skip over null match
> >> * and restart search. Otherwise, return.
> >> */
> >> - if (bp + iop->scanoff < iop->dataend) {
> >> + if (bp + iop->scanoff <= iop->dataend) {
> >> bp += iop->scanoff;
> >> goto again;
> >> }
> >