bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15483: POSIXLY_CORRECT documentation vis a vis some simple EREs


From: Glenn Golden
Subject: bug#15483: POSIXLY_CORRECT documentation vis a vis some simple EREs
Date: Sat, 28 Sep 2013 11:52:38 -0600
User-agent: Mutt/1.5.21 (2010-09-15)

--
Regarding EREs having leading repetition operators, e.g. '*xyz':

Section 9.5.3 of

    http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html 

supplies the grammar for POSIX-conforming EREs. From the notes at the very
bottom:

    -----------------------------------------------------------------------
    The ERE grammar does not permit several constructs that previous
    sections specify as having undefined results:

        [ ... ]

        * One or more ERE_dupl_symbols appearing first in an ERE, or [ ... ]

    Implementations are permitted to extend the language to allow these.
    Conforming applications cannot use such constructs. 
    -----------------------------------------------------------------------


To my eyes, the last sentence seems to say that a conforming implementation
must not accept EREs like '*xyz'. But egrep [grep 2.14] does accept them,
even with POSIXLY_CORRECT defined, e.g.

   $ export POSIXLY_CORRECT=1
   $ echo 'abcdefghi' | egrep --color=auto '*def'

matches 'def'.  In contrast, POSIX regex(3) rejects such EREs with "invalid
preceding regular expression".

Not sure whether this is a POSIX conformance issue or not; it depends on the
intended semantics of POSIXLY_CORRECT.

To my eyes, the man page is a bit ambiguous, since it first says that it
"behaves as POSIX.2 requires", but then goes on to list only some specific
behaviors related to option processing.  It wasn't clear to me whether listing
the option-related behavior was intended to limit the scope of the
POSIXLY_CORRECT-ness to only those aspects, or if they were listed just
because they are (for example) often confusing to users, hence worthwhile to
call out explicitly.

In summary, there are a few questions/branches to this:

   1. If POSIXLY_CORRECT is intended to be conforming only in the specific
      respects listed, I'd suggest that the name of the associated envar be
      changed to reflect that (e.g., something like POSIXLY_CORRECT_OPTS),
      and also to change the man page text to read something like:

       POSIXLY_CORRECT_OPTS
          If set, grep conforms with POSIX.2 with regard to the following
          option processing behaviors: [ description of option behaviors ]

   2. If POSIXLY_CORRECT is intended to mean 'fully conforming in all respects'
      then it seems like the present behavior is in technical violation.

   3. If (2) is the case, and the decision is made to change the behavior of
      grep accordingly, it might be worthwhile to also change the doc for
      POSIXLY_CORRECT to something like this:

       POSIXLY_CORRECT
          If set, grep conforms with POSIX.2 in all respects.  In particular,
          [ description of option-related behaviors and/or other behaviors
            that are deemed worthwhile to call out explicitly ]

   4. If (2) is the case, but the decision is made not to change the behavior
      of grep (i.e. accept the non-conformance) it might be wortwhile to
      change the doc for POSIXLY_CORRECT to something like this:

       POSIXLY_CORRECT
          If set, grep conforms with POSIX.2 in almost all respects.  In
          particular, [ description of option-related behaviors and/or other
          behaviors that are deemed worthwhile to call out explicitly ]. But
          it does not conform precisely regarding ERE's like '*xyz' [ and
          whatever other ways are known to be non-conforming. ]

To pre-answer an expected question, asked of a submitter (Roman Donchenko) in
a similar POSIX violation bug report (#37737): 

    Are you encountering this problem in a real-world usecase, or are you
    simply reporting a violation of the standard?

My response is essentially the same as Roman gave: I am reporting it only as a
violation, but otoh, the POSIX-mandated behavior makes a lot more sense to me
than the current behavior, since expressions like '*xyz' are almost always user
error; the intent is usually '.*xyz'.  So if such expressions were rejected by
egrep, it would IMO be a behavioral improvement for users (like, ummm... me)
who chronically mis-remember how '*' is interpreted by bash vs. grep. 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]