bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] _gsub issue?


From: Aharon Robbins
Subject: Re: [bug-gawk] _gsub issue?
Date: Fri, 17 Aug 2012 12:22:53 +0300
User-agent: Heirloom mailx 12.4 7/29/08

Greetings.

Concerning the below. First, thank you for taking the time
to submit a bug report.

> Date: Fri, 17 Aug 2012 01:33:28 +0300
> From: Denis Shirokov <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] _gsub issue?
>
> Hi GAWK.
>
> I write on the GAWK3.x-4.x last 3 years. Respect due to you guys. I'm
> intrested about writing some very large scripts. Especially about text
> parsing, running subscripts under control higher script, data
> migration mechanics between its, etc.
>
> Anyway,
> Your Documentation says that if i found some bugs then i can connect
> and report to you.
> I found a lot of serious troubles in MS GAWK4.0.0.
> Here is the latest:
>
> BEGIN{
>
>       a="ABC"
>       gsub(/\x1B\x2A/,"\x0A",a)
>       
>       print a "def" }
>
> Expected:
>
> abcdef

Actually, you are expecting  ABCdef.

> Actual:
>
> A
> B
> C
> def
>
> The gsub func inserts character \x0A between A,B and C like this: \x0A
> A \x0A B \x0A C \x0A.
> REGEXP \x1B\x2A not present in a.

This is not a bug.  It is a difference from historical behavior mandated
by POSIX, as explained in the gawk documentation. You cand find the relevant
bits online at 
http://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html#Escape-Sequences:

        Advanced Notes: Escape Sequences for Metacharacters

        Suppose you use an octal or hexadecimal escape to represent a
        regexp metacharacter. (See Regexp Operators.) Does awk treat
        the character as a literal character or as a regexp operator?

        Historically, such characters were taken literally. (d.c.)
        However, the POSIX standard indicates that they should be treated
        as real metacharacters, which is what gawk does. In compatibility
        mode (see Options), gawk treats the characters represented by
        octal and hexadecimal escape sequences literally when used in
        regexp constants. Thus, /a\52b/ is equivalent to /a\*b/.

In this case \x2A is the '*' character. So gawk does a match of \x1B*
(ESC *). Thus, because of the *, there is a successful match of the null
string, and gawk does the substitution. With the --traditional flag,
gawk does indeed produce the output you expect.

> If You want i can reproduce for you some other troubles like _PROCINFO
> in END rule, EXIT in END rule, crash if \x80-\xFF in REGEXP etc.

Please do submit bug reports.  Preferably one per bug.  Please indicate
which version of gawk you are using (please try 4.0.1, which is the current
released version) and also indicate which operating system you use.

If you are using Microsoft Windows, I strongly recommend using the binary
from http://sourceforge.net/projects/ezwinports/files/ (unless you compile
gawk yourself).

Thank you,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]