[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: using non-Emacs regexp syntax
From: |
Paul Pogonyshev |
Subject: |
Re: using non-Emacs regexp syntax |
Date: |
Sat, 2 Dec 2006 00:54:12 +0200 |
User-agent: |
KMail/1.7.2 |
Stuart D. Herring wrote:
> > If you don't mind, I'll work on it now. Changes can be added to whatever
> > .el file in the distribution later.
> >
> > Also, is there sense in supporting conversion to and from several formats?
> > E.g. some require that plus operator is escaped, while everything else is
> > not. E.g. something like this:
> >
> > (convert-regexp :sed :emacs some-regexp)
> > FROM TO PATTERN-STRING
> >
> > Of course, it will add more complexity, but it shouldn't be much of a
> > problem for users of this function and implementing it in Lisp should
> > still
> > be not hard.
>
> I've already started on this sort of thing, writing a converter just
> between the two formats supported by GNU grep. (These are
> "GNU-extended-basic-RE" and "extended-RE with backreferences".) As it
> happens, that conversion can be done with one function because the formats
> are so similar. I had planned to go on to the more general case, but for
> now I'll just provide what I have for comment and/or use. (I have papers,
> so any use is fine.) If, Paul, you'd like, we can collaborate on this, or
> one of us of your choice can go on with it.
>
> [...]
I will happily pass this to you if you wish. I planned a more generic
implementation which can be briefly described as this:
* Each implemented format provides a table of associations
construct-name -> construct-generator (some constructs, like []
character class, will require a parameter.) In the simplest form,
construct-generator can be just a fixed string, which will suffice in
most cases.
* Each format also provides a parser that splits a regexp into a list
of construct-name.
* Entry function (or a helper for it) combines together a table for
output format and a parser for input format. The result is a regexp
in output format.
Maybe it is too slow, though. However, given that Emacs lived happily
without this sort of function, it can hardly be too slow. But maybe
you can come up with a simpler solution.
(One more thing: it probably makes sense to add conversion function
for replacement strings too. E.g. some formats require $N, some
(like Emacs) use \N for referencing the matched group.)
Paul