bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk: {} repetition in patterns doesn't work?


From: Paul Eggert
Subject: Re: gawk: {} repetition in patterns doesn't work?
Date: Wed, 21 Mar 2001 10:59:55 -0800 (PST)

> From: address@hidden
> Date: 21 Mar 2001 08:44:13 +0200
> 
> echo 'aa' | awk '/a{2}/' 
> 
> It prints 'aa' with HP-UX awk and so it should according to
> my understanding of POSIX.2.

That's correct.

> It doesn't work with 'nawk' in Solaris either, though. A bug or a
> feature?

Both.  :-)

The POSIX requirement is widely ignored, because it causes problems
with patterns that contain stray '{' characters.  Historically, awk
did not support the a{2} notation, and many awk scripts contain code
that treat '{' as literal, e.g.:

        /{.*}/ { print "found matching braces"; }

POSIX says that the behavior of this code is undefined because of the
stray '{'.  However, scripts like this work as expected with gawk, as
well as with most other awks.

gawk should do what GNU grep does: namely, support the POSIX
requirement only when it is absolutely required, and otherwise treat
stray braces as literal braces.  POSIX allows this behavior.  Here is
a quote from the grep manual that should help explain things better:

      GNU `egrep' attempts to support traditional usage by assuming that
   `{' is not special if it would be the start of an invalid interval
   specification.  For example, the shell command `egrep '{1'' searches
   for the two-character string `{1' instead of reporting a syntax error
   in the regular expression.  POSIX.2 allows this behavior as an
   extension, but portable scripts should avoid it.

On my list of things to do is to add support for this to GNU regexp.c.
That should make it easy to fix gawk to be POSIX-compliant here.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]