bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: documentation around RE repetition metachars may need clarification


From: Ed Morton
Subject: Re: documentation around RE repetition metachars may need clarification
Date: Mon, 22 May 2023 10:47:37 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

On 5/22/2023 10:06 AM, Andrew J. Schorr wrote:
Hi,

On Sun, May 21, 2023 at 09:46:50AM -0500, Ed Morton wrote:
In the gawk manual 
underhttps://www.gnu.org/software/gawk/manual/html_node/Regexp-Operator-Details.html
we have this statement:

In POSIX |awk| and |gawk|, the ‘*’, ‘+’, and ‘?’ operators stand
for themselves when there is nothing in the regexp that precedes
them.
while in the POSIX spec 
underhttps://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_03
we have this statement:

*+?{
    The <asterisk>, <plus-sign>, <question-mark>, and <left-brace>
    shall be special except when used in a bracket expression (see RE
    Bracket Expression
    
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05>).
    Any of the following uses produce undefined results:

     *

        If these characters appear first in an ERE

So the gawk manual statement says that /+foo/ in any POSIX awk will
match the literal string "+foo" while the POSIX spec statement says
it's undefined behavior.

Should the gawk manual be tweaked to clarify/explain what it
currently says about POSIX awk since it apparently contradicts the
POSIX spec?
Stupid question: when something says that the behavior is undefined, is
it not the case that a given implementation is entitled to make its
own choice about how to handle that situation? If so, why is gawk's
choosing to match "+foo" at odds with POSIX? If it's "undefined", do
you instead expect it to throw an error?

Regards,
Andy

Andy - it's not what the gawk manual says about gawks behavior that's the issue, it's what the gawk manual says about POSIX awk's behavior ("`+` at the start of a regexp is a literal char") since it doesn't match what the POSIX spec says about POSIX awk's behavior ("`+` at the start of a regexp is undefined"). Yes, gawk or any other awk can do whatever it likes for behavior that's undefined by POSIX, as in this case, and no I don't expect gawk to throw an error (but a lint warning would be useful for people looking to write portable scripts to ensure they aren't doing anything undefined).

    Ed.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]