bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: have you ever mistyped [[:lower:]] as [:lower:] ?


From: Jim Meyering
Subject: Re: have you ever mistyped [[:lower:]] as [:lower:] ?
Date: Wed, 01 Sep 2010 14:51:31 +0200

Paolo Bonzini wrote:
> On 09/01/2010 10:11 AM, Jim Meyering wrote:
>> No, that's not the same at all.  grep must not do that.
>> It is one thing to fail for an obviously erroneous construct.
>> It would be worse to silently transform it into the "intended"
>> one, since then GNU grep would silently work as intended, but that
>> same erroneous command would produce different results with non-GNU grep.
>>
>> Here, we're making it clear that this is a serious error.
>
> It's not, as it is syntactically correct.

Hi Paolo,

Using an argument to grep like '[:space:]' is an obvious misunderstanding
aka, error, on the user's part;  they intend something like '[[:space:]]'.
I want grep to treat that as an error.  That POSIX would require grep
to interpret it as the wrong thing ("wrong" from the user's perspective)
is not a show stopper.  grep can still do what I want, by default.
That's precisely why we have POSIXLY_CORRECT: to toe the POSIX line,
in spite of what common sense guidelines or good judgment might suggest.
GNU has a long history of making improvements like this that don't
quite fit in the strict POSIX mold.  We are not slaves to POSIX, but
nor do we contravene without careful though.

>>> Second, if this was done, it should operate in the same way in sed,
>>> expr, awk, and all other GNU programs that deal with regexes.  (And
>>> possibly in glibc too).
>>
>> It would be nice to make other GNU programs provide the same new
>> feature.  However, if they don't (or don't right away), it's not
>> a big deal.
>
> I know that sed won't as long as I'm maintainer.

Who knows... POSIX may allow this new behavior, someday.
Wouldn't you adjust sed if POSIX were to permit special
handling of obviously-erroneous regular expressions?

>>> If you want to add --warn=error (which is a "superset" of
>>> --warn=always behavior), that's fine and I actually like the idea.
>>> But I think making it the default non-POSIXLY_CORRECT behavior is
>>> wrong.  Honestly, if this happened I would regret having introduced
>>> the feature in the first place.
>>
>> I hope you don't regret it.
>> Sometimes you just have to admit that
>> POSIX-is-clear-and-POSIX-can-be-improved.
>
> POSIX can be improved in many ways, and GNU is a testimony to this.
> In fact, I should have participated to POSIX more and made sure some
> of my sed extensions went into POSIX.2-2008.
>
> But this is not a case in which POSIX can be improved.  POSIX provides
> a nice grammar for regex, and our warning is a hack on top of that
> grammar.  POSIX provides the clean thing as a standard should, and we
> build on top of it a useful hack.
>
> What could have a place in a standard, is a mechanism for grep to warn
> about doubtful regular expressions.  Mandating "what is" a doubtful
> regular expression (which is a prerequisite if you want grep to exit
> with status 2) does not have its place in a standard.  The C standard
> does not say what algorithms to use in order to find uninitialized
> variables or dead stores.
>
>> We should not let standards get in the way of improving our software.
>>
>> That we are making this the default behavior
>> is a tribute to the usefulness of your new feature.
>
> It's not.  It's a huge mistake, because making it an error means
> changing the regex grammar (and making it unnecessarily complicated
> and contrived).

It's already done in grep, and wasn't a very big change.
I'm not advocating such a change in any other package.
Let each maintainer decide.  It's alright with me if no one else opts
to make the change.  This "feature" is only to help detect/avoid a
small class of error earlier, and if it affects an existing script,
I suspect authors and users alike will welcome the alert about the
existing bug.

FWIW, I would not even bother to propose a change to glibc/regex,
since that would probably require an interface change, and that would
not be justifiable.

>> The following change-set implements what Paul and I have been advocating.
>> I'll push it later today or tomorrow.
>
> I still think this is wrong, and doubly wrong because I cannot disable
> it on my system without breaking it with POSIXLY_CORRECT.  Please,

You want to disable it?
I doubt you intend to use grep '[:space:]'...,
so I still fail to understand why you would want that.

It sounds like you're upset.
Sorry it's come to that, but I feel strongly about this, too.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]