emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unquoted special characters in regexps


From: martin rudalics
Subject: Re: Unquoted special characters in regexps
Date: Sun, 26 Feb 2006 12:32:18 +0100
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

>>>This is incorrect.  ']' is only special in a bracket expression (like
>>>'-').
>>
>>`]' is _also_ special in a character alternative,
>
>
> A bracket expression has a completely different set of special characters.
> For example, '\' and '$' are not special there.
>
>
>>`-' is special _only_ in a character alternative.
>
>
> Just like ']'.
>
> Andreas.
>

We would have to agree on the semantics of the term "special" first. In
Elisp descriptions this term is overloaded.  Take, for example, the
following excerpt from the Elisp tutorial:

   Indeed more than one such mark or brace may precede the space.  These
   require a expression that looks like this:

             []\"')}]*

      In this expression, the first `]' is the first character in the
   expression; the second character is `"', which is preceded by a `\'
   to tell Emacs the `"' is _not_ special.  The last three characters
   are `'', `)', and `}'.

It's confusing because we know from the Elisp manual that

   The special characters are `.', `*', `+', `?', `[', `]', `^', `$',
   and `\'; no new special characters will be defined in the future.

hence a double-quote is never "special" in terms of regexp semantics.
Why should we have to tell Emacs that it is "_not_ special" then?  The
answer is, obviously, that the Elisp read syntax for regexps is the
string data type and the tutorial's "special" indeed refers to string
semantics.  Hence when you say that "'\' and '$' are not special there"
you probably don't mean the special semantics of the backslash within
strings.

Now let's agree on the term "there".  Reasonably, "there" is the
sequence of characters obtained after stripping both the opening bracket
_and_ the closing bracket of a character alternative.  Otherwise, the
sentence from the Elisp manual "To include a `]' in a character
alternative, you must make it the first character." wouldn't make sense.
`]' is special inside a character alternative because it may appear in
one and only one position - namely the first.  And the semantics of the
`]' in the first position is "match one `]'".  The semantics of an `]'
closing a character alternative is completely different from that.

From an operational point of view - that of the Elisp interpreter - you
_can_ say that `]' is not special in regexps.  If that's the preferred
point of view it's sufficient to remove `]' from the list of special
characters in the respective manuals and treat it like `-' as you
propose.  And, you wouldn't have to mention "poor practice" in the Elisp
manual at all - anything the Elisp interpreter interprets as intended by
the programmer would be valid.

There exists, however, a functional subset of Elisp (Elisp without
setqs, iterators, ...) amenable to mathematical reasoning like proving
correctness or validity of your code.  And mathematicians don't like to
reason about malformed constructs like "(a + 3" or "a + 3)".  They
prefer something like "a + 3" or "(a + 3)" instead.  They do rely on the
special semantics of "(" _and_ ")" within an expression.  Hence, saying
that `[' is special and `]' not would be tantamount to removing regexps
from the subset of Elisp amenable to mathematical reasoning.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]