[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Regexp] Finding failure point in RE
From: |
mitch-GNU RegExp List |
Subject: |
RE: [Regexp] Finding failure point in RE |
Date: |
Tue, 27 Jul 2004 12:05:45 -0500 |
As you suspected, this didn't work too well on non-trivial cases.
I've started attacking this in a different way that shows some promise. I
added 2 methods to the CharIndexed interface:
/**
* Updates the maximum indexed that was matched in the input.
*/
public void updateMaxMatchedIndex(int a_new_index);
/**
* Retrieve the highest index in the input that was matched
* @return int
*/
public int getMaxMatchedIndex();
and implemented some trivial code to maintain a max matched index in each
class that implements it.
Each REToken calls updateMaxMatchedIndex in its match() method as
appropriate.
RE.getLengthMatched() calls getMaxMatchedIndex() after firstToken.match() to
retrieve the value.
I haven't coded all of the REToken's yet, but just doing RETokenChar and
RETokenPOSIX got me about 90% of what I needed.
Do you think this is valuable enough to fold into the base code or should I
plan on keeping my own variation?
Mitch
-----Original Message-----
From:
address@hidden
org
[mailto:gnu-regexp-users-bounces+mitch-keyword-gnuregexp.197863=claborn.
address@hidden Behalf Of Wes Biggs
Sent: Monday, July 26, 2004 4:30 PM
To: 'address@hidden'
Subject: Re: [Regexp] Finding failure point in RE
Mitch, that's not going to work as is. Take out the "if" block around
firstToken.match(); it's OK if this returns false (no full match). Still
not guaranteeing it will work, though. :-)
public int getLengthMatched(Object o, int index, int eflags) {
CharIndexed input = makeCharIndexed(o, index);
if (firstToken == null) { return 0; } // Trivial case of empty regexp
REMatch m = new REMatch(numSubs, index, eflags);
firstToken.match(input, m);
int max = 0;
while (m != null) {
if (m.index > max) { max = m.index; }
m = m.next;
}
return max;
}
Claborn, Mitch wrote:
>Thanks Wes.
>
>Yes, I am using isMatch().
>
>I'll give your code a try and report the results here.
>
>mitch
>
>
>-----Original Message-----
>From:
>address@hidden
>org
>[mailto:gnu-regexp-users-bounces+mitch-keyword-gnuregexp.197863=claborn.
>address@hidden Behalf Of Wes Biggs
>Sent: Monday, July 26, 2004 4:19 PM
>To: 'address@hidden'
>Subject: Re: [Regexp] Finding failure point in RE
>
>
>mitch-GNU RegExp List wrote:
>
>
>
>>I posted this question a while but got no response, so I'll try once
>>
>>
>more...
>
>
>>Is there a way (or plans to develop a way) to discover where in a regular
>>expression that matching failed (i.e. didn't find a match)? Or
>>
>>
>alternately,
>
>
>>but not as useful, where in the regular expression is the last point that
>>successfully matched the input string?
>>
>>Background: I created a system that uses regular expressions to match
>>against the contents of incoming emails that contain output from various
>>status checks, operational tasks, etc. When a match fails, it is a time
>>consuming processes to discover where the failure point is. A index into
>>the regular expression (or input string I guess) that showed where the
>>
>>
>match
>
>
>>failed would be very useful and time saving.
>>
>>
>>
>>
>
>Hi Mitch -- the short answer is no, there is not currently a way or a
>plan to implement this.
>
>I'm assuming you're applying this to a situation where you're using
>isMatch() -- otherwise the logic gets a little ambiguous, because a
>failed RE will fail at every point along the input.
>
>You could add a method like
>int RE::getLengthMatched(input)
>which would execute similarly to isMatch() but keep the contextual
>information such that
>RE.isMatch(input) ==> (RE.getLengthMatched(input) == input.length())
>
>Here's some untested off-the-cuff code you can try adding to RE.java:
>
>public int getLengthMatched(Object o, int index, int eflags) {
> CharIndexed input = makeCharIndexed(o, index);
> if (firstToken == null) { return 0; } // Trivial case of empty regexp
> REMatch m = new REMatch(numSubs, index, eflags);
> if (firstToken.match(input, m)) {
> int max = 0;
> while (m != null) {
> if (m.index > max) { max = m.index; }
> m = m.next;
> }
> }
> return max;
> }
>
>
>
>
_______________________________________________
Gnu-regexp-users mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/gnu-regexp-users