emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unquoted special characters in regexps


From: martin rudalics
Subject: Re: Unquoted special characters in regexps
Date: Wed, 01 Mar 2006 14:00:05 +0100
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

> Martin Rudalics wrote:
>
>    It would be strange to say, for example, that the double-quote
>    opening an Elisp string is outside the context of the string and
>    the double-quote that closes it inside.
>
> I do not see why you consider this strange.  Quite to the contrary,
> this is exactly what allows one to determine whether a `"' opens or
> closes a string.  `"" is special both inside and outside the context
> of a string.  But its special meaning depends on that context.
> Outside the context of a string `"' starts a string, inside the
> context of a string, `"' ends a string.  So an opening `"' is opening
> _because_ it occurs outside of a string context and the closing `"' is
> the closing one _because_ it occurs inside a string context.
>
> Note that the GNU regexp manual, node `(regex)List Operators' agrees
> with Andreas and me that `[' is special _outside_ a character alternative
> (by stating that it is ordinary inside one) and explicitly states that
> `]' has the special meaning of closing a character alternative
> _inside_ a character alternative.  (Note that it refers to character
> alternatives as "lists".)

If you refer to section "3.6 List Operators ([ ... ] and [^ ... ])" of
the GNU regex manual I can exctract three relevant sentences:

"A matching list matches a single character represented by one of the
list items. You form a matching list by enclosing one or more items
within an open-matching-list operator (represented by `[') and a
close-list operator (represented by `]')."

If you deduce here that the "close-list operator" is part of the "items
within" you can deduce that the "open-matching-list" operator is part of
the "items within" as well.

"`]' ends the list if it's not the first list item. So, if you want to
make the `]' character a list item, you must put it first."

`]' is special inside a chararacter list - the "items within" mentioned
above - because it has to appear as the first element of that list.

"`-' represents the range operator (see section 3.6.2 The Range Operator
(-)) if it's not first or last in a list or the ending point of a range."

If `-' can be "last in a list" the close-list operator `]' cannot be
"last in that list".  Ex falso sequitur quodlibet.


If anyone's interested in how other languages handle regexp brackets
see the list below:

Perl's metacharacters are:
    { } [ ] ( ) ^ $ . | * + ? \

Python metacharacters are:
    . ^ $ * + ? { [ ] \ | ( )

PHP:
    Outside square brackets, the meta-characters are as follows:
    ...
    [ start character class definition
    ] end character class definition
    ...

XML:
    A metacharacter is either ., \, ?, *, +, {, } (, ), [ or ].

Tcl:
    A regular expression uses metacharacters (characters that assume special
    meaning for matching other characters) such as *, [], $ and ..
    ...
    A backslash (\) disables the special meaning of the following character,
    so you could match the string [Hello] with the RE \[Hello\].

Java (http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html):
    Perl is forgiving about malformed matching constructs, as in the
    expression *a, as well as dangling brackets, as in the expression
    abc], and treats them as literals.

    Java also accepts dangling brackets but is strict about dangling
    metacharacters like +, ? and *, and will throw a
    PatternSyntaxException if it encounters them.

Hence all classic regexp languages do consider `]' special and do not
consider `-' special.  The Java doc calls the `]' in `abc]' a dangling
bracket.  The fact that languages "forgive" or "accept" such constructs
shouldn't cause anyone to promote such style.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]