chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] an oddly slow regex ...


From: Peter Bex
Subject: Re: [Chicken-users] an oddly slow regex ...
Date: Sat, 26 Oct 2013 20:38:12 +0200
User-agent: Mutt/1.4.2.3i

On Sat, Oct 26, 2013 at 10:37:36AM -0700, Matt Welland wrote:
> This regex is so slow that you don't need a timer to see the impact (at
> least not on my machine with chicken 4.8.0):
> 
>  (string-match "[a-z][a-z0-9\\-_.]{0,20}" "a012345678901234567890123456789")
> 
> Changing the {0,20} to + makes it run normally fast so I just replaced the
> regex with a string-length and modified the "{0,20}" to "+" . I don't
> necessarily need a fix for this but it seems like a possible symptom of a
> deeper problem so I thought I'd report it.

Hi Matt,

Thanks for your report.  I'm afraid this is a known problem with
Irregex - to avoid producing a state machine with too many states,
it will always use a backtracking implementation for all repetition
counts.

I think it's best to take a look at how to fix this upstream first.
Maybe Alex has an idea of how to do that.

> Pre-compiling the regex didn't seem to make any difference.

That's because the backtracker matches really slowly.

> Just for completeness I compared with Ruby and the ruby equivalent
> is (in human terms) instant.
> 
> "a012345678901234567890123456789".match(/[a-z][a-z0-9]{0,20}/)

Thanks for that!  We need more of such real-world test cases.
Ideally I'd love to have a complete library of regex examples, which
we can use to optimize irregex further.

By the way, I note that your Ruby regex isn't exactly the same as the
one you used in CHICKEN.  Does it make a difference if you use the
same regex?

Cheers,
Peter
-- 
http://www.more-magic.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]