classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cp-patches] RFC: gnu.regexp: fixed bugs in stingy match of RETokenR


From: Ito Kazumitsu
Subject: Re: [cp-patches] RFC: gnu.regexp: fixed bugs in stingy match of RETokenRepeated
Date: Tue, 24 Jan 2006 07:25:29 +0900 (JST)

From: Ito Kazumitsu <address@hidden>
Date: Mon, 23 Jan 2006 23:06:42 +0900 (JST)

> --- classpath/gnu/regexp/REMatch.java 22 Jan 2006 02:22:21 -0000      1.3
> +++ classpath/gnu/regexp/REMatch.java 23 Jan 2006 13:43:12 -0000

> +    Vector repeats; // number of repeats of each stingy repeated token
> +    // Request For Comment: The Vector repeats contains number of repeats
> +    // from left to right without regard to what the token is.
> +    // I am not quite sure this is reasonable. 

> --- classpath/gnu/regexp/RE.java      22 Jan 2006 02:22:21 -0000      1.12
> +++ classpath/gnu/regexp/RE.java      23 Jan 2006 13:43:11 -0000

> +               REMatch best = mymatch;
> +               if (! best.stingy) {

>                 }
> +               else {
> +                   // Find best match of them all to observe
> +                   // leftmost least repeated
> +                   while ((mymatch = mymatch.next) != null) {
> +                       if (compareRepeats(mymatch, best) < 0) {
> +                           best = mymatch;
> +                       }
> +                   }
> +               }

This is an ad hoc fix and not carefully designed. I can think of problems
such as

  How should we compare an stingy match and a non-stingy match?

For example, Sun's JDK shows the following result.

/(a+?|a+)/
    aaaaaaaaaaaaa
 0: a
 1: a

/(a+|a+?)/
    aaaaaaaaaaaaa
 0: aaaaaaaaaaaaa
 1: aaaaaaaaaaaaa

But my patched gnu.regexp shows:

/(a+?|a+)/
    aaaaaaaaaaaaa
 0: aaaaaaaaaaaaa
 1: aaaaaaaaaaaaa

/(a+|a+?)/
    aaaaaaaaaaaaa
 0: aaaaaaaaaaaaa
 1: aaaaaaaaaaaaa




reply via email to

[Prev in Thread] Current Thread [Next in Thread]