[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [cp-patches] RFC: gnu.regexp: fixed bugs in stingy match of RETokenR
From: |
Ito Kazumitsu |
Subject: |
Re: [cp-patches] RFC: gnu.regexp: fixed bugs in stingy match of RETokenRepeated |
Date: |
Tue, 24 Jan 2006 07:25:29 +0900 (JST) |
From: Ito Kazumitsu <address@hidden>
Date: Mon, 23 Jan 2006 23:06:42 +0900 (JST)
> --- classpath/gnu/regexp/REMatch.java 22 Jan 2006 02:22:21 -0000 1.3
> +++ classpath/gnu/regexp/REMatch.java 23 Jan 2006 13:43:12 -0000
> + Vector repeats; // number of repeats of each stingy repeated token
> + // Request For Comment: The Vector repeats contains number of repeats
> + // from left to right without regard to what the token is.
> + // I am not quite sure this is reasonable.
> --- classpath/gnu/regexp/RE.java 22 Jan 2006 02:22:21 -0000 1.12
> +++ classpath/gnu/regexp/RE.java 23 Jan 2006 13:43:11 -0000
> + REMatch best = mymatch;
> + if (! best.stingy) {
> }
> + else {
> + // Find best match of them all to observe
> + // leftmost least repeated
> + while ((mymatch = mymatch.next) != null) {
> + if (compareRepeats(mymatch, best) < 0) {
> + best = mymatch;
> + }
> + }
> + }
This is an ad hoc fix and not carefully designed. I can think of problems
such as
How should we compare an stingy match and a non-stingy match?
For example, Sun's JDK shows the following result.
/(a+?|a+)/
aaaaaaaaaaaaa
0: a
1: a
/(a+|a+?)/
aaaaaaaaaaaaa
0: aaaaaaaaaaaaa
1: aaaaaaaaaaaaa
But my patched gnu.regexp shows:
/(a+?|a+)/
aaaaaaaaaaaaa
0: aaaaaaaaaaaaa
1: aaaaaaaaaaaaa
/(a+|a+?)/
aaaaaaaaaaaaa
0: aaaaaaaaaaaaa
1: aaaaaaaaaaaaa