classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Classpath regexp code


From: Ziga Mahkovec
Subject: Re: Classpath regexp code
Date: Sun, 05 Jun 2005 02:03:29 +0200

On Wed, 2005-06-01 at 15:11 +0200, Ziga Mahkovec wrote:
> On Tue, 2005-05-31 at 19:34 -0600, Tom Tromey wrote:
> > I think we should consider using jregex even without an assignment.
> > Unless somebody wants to drastically speed up gnu regex, that is.
> 
> Well, I started looking into these performance problems, but I'm not
> sure yet if there's any low hanging fruit here.
> 
> Anthony already showed that lots of time is spent running clone() and
> garbage collection.  The culprit here is
> gnu.regexp.RETokenRepeated.match(), which is used for matching a*, a?, a
> + and a{n,m} tokens.  In the worst case, it clones two
> gnu.regexp.REMatch instances for *each* token of the input string.
> REMatch contains two integer arrays (also cloned) and other fields as
> well, so this is a killer for performance.
> 
> I'll be able to spend more time on this the coming weekend.

After spending some more time profiling gnu.regexp, I'd say that short
of a major rework, its performance can't be improved significantly.
Even by getting rid of the problematic clone() calls (which is only
possible for simple expressions without backtracking), it still didn't
get near jregex.

I found another analysis[1] of the library with similar conclusions.
The analysis also includes a patch, but that only seems to improve
things for Sun's JVM -- for jamvm, gij and gcj I found the copy-
constructor to be slower than cloning.

So I think integrating jregex would make sense.  Apart from being much
faster, it also passes a lot of the Mauve tests I had previously
disabled for gnu.regexp.  With a couple of patches I got jregex to pass
463/485 tests (gnu.regexp passes 367/485).

[1] http://lists.gnu.org/archive/html/gnu-regexp-users/2003-05/msg00000.html

-- 
Ziga





reply via email to

[Prev in Thread] Current Thread [Next in Thread]