bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 21.1 incorrect Regular expression parsing


From: Jari Aalto+mail.emacs
Subject: Re: 21.1 incorrect Regular expression parsing
Date: Tue, 11 Dec 2001 23:57:16 +0200
User-agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/20.7 (i386-*-nt5.0.2195) (i386-*-nt5.0.2195)

* 2001-12-10 David.Kastrup@t-online.de (David Kastrup) gnu.emacs.bug
* 
<http://search.dejanews.com/msgid.xp?MID=%3Cx5n10q4r7g.fsf@tupik.goethe.zz%3E&format=threaded>
>>>>>> "Jari" == Jari Aalto+mail emacs <jari.aalto@poboxes.com> writes:
|
|  Jari| I understand this to be a regular expressing bug and It would
|  Jari| be good if Emacs signalled error.
|
| The Emacs manual states about regular expressions:
|
| `+'
|      is a postfix operator, similar to `*' except that it must match
|      the preceding expression at least once.  So, for example, `ca+r'
|      matches the strings `car' and `caaaar' but not the string `cr',
|      whereas `ca*r' matches all three strings.
|
| It does not say that the preceding expression is not allowed to be a
| more complicated expression, so x++ according to that interpretation
| would be equivalent to x+.
|
| We also have the passage
|    Note: for historical compatibility, special characters are treated as
| ordinary ones if they are in contexts where their special meanings make
| no sense.  For example, `*foo' treats `*' as ordinary since there is no
| preceding expression on which the `*' can act.
|
| So if you said in x++ the second + is in a context "where their
| special meanings make no sense", it would match a literal +.
|
| The Emacs manual does not say any use of those characters is illegal.

To me regular expression language consists of two kinds of items:

    literals
    tokens

In the token group there are basic quantifiers, which are * + ?.  I
can't understand how they could also be literals. Emacs manual may
allow this and Emacs regexp engine may accept this, but that goes
against every user's knowledge of how they have learned regular
expressions to be written: double-triple-quadruple... quantifiers are
incorrect and should signal parser error. 

It's different if the behavior is defined: e.g. in Perl some
double-quantifiers like *? have a special meaning which were
used to extend the regular expression grammar.

Another reference, Perl:
(v5.6.1 built for cygwin-multi)

    //root@W2KPICASSO $ perl -e '$_=11; print if /\d++/;'
    Nested quantifiers before HERE mark in regex m/\d++ << HERE / at -e line 1.

    //root@W2KPICASSO $ perl -e '$_=11; print if /\d+/;'
    11

I Still consider it bad practice to leave door open to a 
construct "++" in Emacs. Please consider putting a plug to it.

Jari





reply via email to

[Prev in Thread] Current Thread [Next in Thread]