help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: will we ever have zero width assertions in regexps?


From: Ilya Zakharevich
Subject: Re: will we ever have zero width assertions in regexps?
Date: Sat, 29 Jan 2011 22:28:48 +0000 (UTC)
User-agent: slrn/0.9.8.1pl1 (Linux)

On 2011-01-29, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>> As I said, I put some optimizations which in most (AFAIK) practical
>> senses remove such pathologies.  (The underlying problems remain; the
>> optimizations are only "heuristic"; but one needs to be extra
>> inventive to circumvent the optimizations.)
>
> A typical case could look something like "foo *(.*?) *bar". when
> matching "foo ..<many space>.. baZ".

No, this is a polynomial-time problem.  My optimization does nothing
for such cases.  And I do not think such a REx would provide any
problem in real life - unless you have many hundreds of consecutive
spaces.  (And unless Emacs' REx engine is particularly slow per OPCODE.)

But I start to see the difference - it is in usage scenarios.  Many
Perl REx matches are done "per-line", not "per-file".  And in Emacs,
one would rarely narrow the buffer before a match.  This creates a
major skew in REx matches - in Emacs the string to match against would
be quite often a couple of orders of magnitude larger.

Essentially, in Perl, a sloppy REx which leads to a non-linear
polynomial time matching would be mostly unnoticed speed-wise.  Only
those which lead to exponential-time match bite hard.

So my patches for Perl improve only the exponential cases of
"sloppyness".  In Emacs, polynomial-time may give quite noticable
problems too.  Hmm, AFAICS, one can slighly modify my patches to
handle polynomial-time matches too.  I would think about it more (but
do not promise any action! ;-).

> Basically, provide a primitive like (match-string RE STRING LIMIT) that
> can not only say "matched between START and END", but also "reached
> LIMIT within yet finding a match, here's the suspended SEARCH-STATE at
> LIMIT", so you can later resume the search starting at LIMIT by passing
> that state.

match-with-continuation.  An interesting idea.  I already implemented
it for Perl (to support (??{}), but it is not exposed to the user.
Would one want this in non-interactive situations?

Thanks,
Ilya


reply via email to

[Prev in Thread] Current Thread [Next in Thread]