chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] an oddly slow regex ...


From: Alex Shinn
Subject: Re: [Chicken-users] an oddly slow regex ...
Date: Mon, 28 Oct 2013 12:52:09 +0900

On Sun, Oct 27, 2013 at 3:38 AM, Peter Bex <address@hidden> wrote:
On Sat, Oct 26, 2013 at 10:37:36AM -0700, Matt Welland wrote:
> This regex is so slow that you don't need a timer to see the impact (at
> least not on my machine with chicken 4.8.0):
>
>  (string-match "[a-z][a-z0-9\\-_.]{0,20}" "a012345678901234567890123456789")
>
> Changing the {0,20} to + makes it run normally fast so I just replaced the
> regex with a string-length and modified the "{0,20}" to "+" . I don't
> necessarily need a fix for this but it seems like a possible symptom of a
> deeper problem so I thought I'd report it.

Hi Matt,

Thanks for your report.  I'm afraid this is a known problem with
Irregex - to avoid producing a state machine with too many states,
it will always use a backtracking implementation for all repetition
counts.

I think it's best to take a look at how to fix this upstream first.
Maybe Alex has an idea of how to do that.

It's possible to expand repetition counts in the DFA.  Your
example basically becomes:

"[a-z][a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?[a-z0-9\\-_.]?"

which in this case probably has a reasonably sized DFA.
In other cases, these patterns can easily blow up and
force the backtracking fallback.  I'm not sure if it's better
to try more aggressively or to keep the easier rule of thumb
that n..m repetitions always force backtracking.

> Pre-compiling the regex didn't seem to make any difference.

That's because the backtracker matches really slowly.

There is possibly some pathological backtracking happening
here, I'll take a look.

The long-term goal is to replace the backtracking with a
more scalable approach (e.g. the non-backtracking NFA
used in the SRFI 115 reference implementation) and only
use DFAs for simple patterns or when speed is explicitly
requested.  Doing this and preserving all PCRE features
will take time though (especially backrefs).

-- 
Alex


reply via email to

[Prev in Thread] Current Thread [Next in Thread]