bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use of '()' in a regexp


From: arnold
Subject: Re: Use of '()' in a regexp
Date: Wed, 06 Jan 2021 12:33:49 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Hi. Re this:

Ed Morton <mortoneccc@comcast.net> wrote:

> Someone just pointed this out to me (gawk 5.1.0):
>
> $ printf 'foo\n' | awk '{gsub(/()/,"x")} 1'
> xfxoxox
>
> $ printf 'foo\n' | awk -v RS='()' -v ORS='x\n' '1'
> foox
>
> Obviously that's a pretty ridiculous regexp but it still has me 
> wondering - why does `gsub()` treat the regexp `()` as matching a null 
> string around every character while `RS` treats it as if I'd asked it to 
> match the `\n` at the end of the input:
>
> $ printf 'foo\n' | awk -v RS='\n$' -v ORS='x\n' '1'
> foox
>
> I could just file this under "don't write stupid regexps" but I was 
> wondering if there's a more concrete, satisfying explanation of the 
> behavior.
>
>      Ed.

It's a bug. This appears to be the fix. It doesn't break the
test suite, either.

Thanks for the report!

Arnold
-----------------------------------------
diff --git a/io.c b/io.c
index 2714398e..0af8ab1e 100644
--- a/io.c
+++ b/io.c
@@ -3702,7 +3702,7 @@ again:
                 * If still room in buffer, skip over null match
                 * and restart search. Otherwise, return.
                 */
-               if (bp + iop->scanoff < iop->dataend) {
+               if (bp + iop->scanoff <= iop->dataend) {
                        bp += iop->scanoff;
                        goto again;
                }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]