emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unquoted special characters in regexps


From: Luc Teirlinck
Subject: Re: Unquoted special characters in regexps
Date: Sun, 5 Mar 2006 09:32:27 -0600 (CST)

Martin Rudalics wrote:

   Luc Teirlinck wrote:
    > If you consider in "[a]b]" the first and the second `]' to be _both_
    > inside or _both_ outside the context of a character alternative, then
    > it would be impossible to determine solely from that notion of context
    > which of the two `]' has to be taken literally.

   That's what I don't get tired of saying for one week already.  You
   always denied it by saying things like

       The special meaning of `]' inside a character alternative is
       obviously to close that alternative.

   and

       `]' has the special meaning of closing a character alternative
       _inside_ a character alternative

Look, I am getting tired of this endless yes-no discussion.  But you
have completely misunderstood everything I have been saying.  Let me
try once more to explain.

Figuring out whether a `]' has to be taken literally or not is a
completely trivial problem, but you are making it difficult on
yourself for counterproductive philosophical reasons.

Start at the beginning of the regexp.  `[' is special, `]' not,
because we are outside a character alternative.  After the first
unquoted `[' is read, which is special because it was typed outside a
character alternative, we are inside a character alternative. `[' is
no longer special, but `]' is (except immediately after the `[' or
"[^"), because we now are inside a character alternative.  After the
next `]' is read, which is special because it was typed inside a
character alternative, we are back outside a character alternative,
`[' is special, `]' not. To summarize, `]' is only special in a
character alternative, `[' is only special outside one.

Note how easy this is.  Unlike for, say \\( you do not even have to
keep track of which `[' matches which `]', because there is no
nesting.  All you need to keep track of is whether you are inside or
outside a character alternative.

You are making things difficult by treating `[' and `]' in regexps as
if they had the usual open-close parentheses syntax, like \\( and \\).
They do *not* and that is the cause of all your misunderstandings.  In
"[1[2]3]" the first `]' closes the first `[' and "balance" makes no
sense for the other `[' and `]'.  If `[' and `]' had the usual open-close
parentheses syntax, the 2 would be inside a nested character
alternative, two levels deep.  But there is no such thing as nested
character alternatives, because, in regexps, `[' and `]' do not have
the usual open-close parentheses syntax (unlike, say, in Lisp vectors).

Sincerely,

Luc.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]