classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cp-patches] RFC: gnu.regexp: fixed bugs in stingy match of RETokenR


From: Ito Kazumitsu
Subject: Re: [cp-patches] RFC: gnu.regexp: fixed bugs in stingy match of RETokenRepeated
Date: Tue, 24 Jan 2006 23:37:52 +0900 (JST)

From: Ito Kazumitsu <address@hidden>
Date: Tue, 24 Jan 2006 07:25:29 +0900 (JST)


> This is an ad hoc fix and not carefully designed. I can think of problems
> such as
> 
>   How should we compare an stingy match and a non-stingy match?

I am beginning to think that gnu.regexp is doing more than is
expected:

  Find all possible matches and select the best match.

Jakarta Regexp seems much simpler (I have not seriously studied
the source code, this is only an impression):

  Go forward. If something fails, backtrack and try another option.
  If there is nothing to be done, then we have found a match.
  If we have more to do but we cannot go ahead, the the matching has
  failed.

Our java.util.regex.Matcher#find() and java.util.regex.Matcher#matches()
both uses the same gnu.regexp.RE#getMatchImpl(), which tries to find
all possible matches.

So with the regexp /^[ab]{1,3}(ab*|b)/, our find() can find
"aabbbbb" from "aabbbbb".  But other implementations
(Sun's JDK, Jakarta Regexp, Perl, Ruby) can find only "aabb".


 1 [ab] : a
 2 [ab] : a
 3 [ab] : b
 4 ab*  : FAIL
 5 b    : b
   Other implementations stop here, but gnu.regexp goes back to 2
   and tries another option.

 3 ab*  : FAIL
 4 b    : b
   Go back to 1 and try another.

 2 ab*  : abbbbb
   "aabbbbb" has been found.


If we can simplufy the work of getMatchImpl(), we will be
getting the same results as other implementations as well as
improved performance.

In that case, when matches() call getMatchImpl(), an implicit
RETokenEnd should be added. Otherwise, "aaa" cannot match /a+?/.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]