help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: On refining regexp by adding exceptions systematically


From: Alan Mackenzie
Subject: Re: On refining regexp by adding exceptions systematically
Date: Sat, 5 Oct 2002 17:45:04 +0000
User-agent: tin/1.4.5-20010409 ("One More Nightmare") (UNIX) (Linux/2.0.35 (i686))

"Stefan Monnier <foo@acm.com>"
<monnier+gnu.emacs.help/news/@flint.cs.yale.edu> wrote on 05 Oct 2002
12:11:30 -0400:
>>> Here is regular expression in emacs lisp that initially seems to work
>>> for the job: [A-Z][A-Z][A-Z][0-9]+

>>> After running it on a number of uses, I find that there is an exception
>>> to it, namely PJP89898.   Rather than rehashing the code after having
>>> forgotten it and reworking my regexp expression (every time I find an
>>> exception) in some convoluted way, is there a systematic way to add an
>>> exception or a series of exceptions to the regexp? I am sure that there
>>> are a number of ways to do this and each has its merits.

>> Regular expressions are designed to find string expressions which are,
>> well, regular.  If you really want to add in an exception like you've
>> got, you're going to end up with something horrible.  It can be done,
>> but like rowing the Atlantic, why bother?

> Actually, the end would still be regular.  But it's not currently
> supported by Emacs' regexp engine.

Sorry, what's not supported?

> Oh and BTW, extended regular expressions (as seen in Emacs and egrep)
> are actually not regular because of back-references.  Perl regexps have
> support for things like that.

Hmm.  Thanks, Stefan!  ;-)  I was just trying to help the guy.

I'm not actually that well versed with the theory behind regexps.  I read
somewhere (that book with the young woman operating a Heath-Robinson
contraption on the front cover, and the same thing broken on the back
cover) that a regexp is equivalent to a finite-state-machine, in that if
one of either of them can recognise a string, so can one of the other.

Nevertheless, I think I know how to use them, more or less, and what
they're good for, and what they're not good for.  On their own, they're
not good for the original poster's problem.

>         Stefan

-- 
Alan Mackenzie (Munich, Germany)
Email: aacm@muuc.dee; to decode, wherever there is a repeated letter
(like "aa"), remove half of them (leaving, say, "a").



reply via email to

[Prev in Thread] Current Thread [Next in Thread]