gnu-regexp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Regexp] Finding failure point in RE


From: mitch-GNU RegExp List
Subject: RE: [Regexp] Finding failure point in RE
Date: Tue, 27 Jul 2004 12:05:45 -0500

As you suspected, this didn't work too well on non-trivial cases.

I've started attacking this in a different way that shows some promise.  I
added 2 methods to the CharIndexed interface:
    /**
     * Updates the maximum indexed that was matched in the input.
     */
    public void updateMaxMatchedIndex(int a_new_index);
    
    /**
     * Retrieve the highest index in the input that was matched
     * @return int
     */
    public int getMaxMatchedIndex();

and implemented some trivial code to maintain a max matched index in each
class that implements it.

Each REToken calls updateMaxMatchedIndex in its match() method as
appropriate.  

RE.getLengthMatched() calls getMaxMatchedIndex() after firstToken.match() to
retrieve the value.  

I haven't coded all of the REToken's yet, but just doing RETokenChar and
RETokenPOSIX got me about 90% of what I needed.

Do you think this is valuable enough to fold into the base code or should I
plan on keeping my own variation?


Mitch


-----Original Message-----
From:
address@hidden
org
[mailto:gnu-regexp-users-bounces+mitch-keyword-gnuregexp.197863=claborn.
address@hidden Behalf Of Wes Biggs
Sent: Monday, July 26, 2004 4:30 PM
To: 'address@hidden'
Subject: Re: [Regexp] Finding failure point in RE


Mitch, that's not going to work as is.  Take out the "if" block around
firstToken.match(); it's OK if this returns false (no full match).  Still
not guaranteeing it will work, though. :-)

public int getLengthMatched(Object o, int index, int eflags) {
    CharIndexed input = makeCharIndexed(o, index);
    if (firstToken == null)  { return 0; } // Trivial case of empty regexp
    REMatch m = new REMatch(numSubs, index, eflags);
    firstToken.match(input, m);
    int max = 0;
    while (m != null) {
      if (m.index > max) { max = m.index; }
      m = m.next;
    }
    return max;
  }




Claborn, Mitch wrote:

>Thanks Wes. 
>
>Yes, I am using isMatch().
>
>I'll give your code a try and report the results here.
>
>mitch
>
>
>-----Original Message-----
>From:
>address@hidden
>org
>[mailto:gnu-regexp-users-bounces+mitch-keyword-gnuregexp.197863=claborn.
>address@hidden Behalf Of Wes Biggs
>Sent: Monday, July 26, 2004 4:19 PM
>To: 'address@hidden'
>Subject: Re: [Regexp] Finding failure point in RE
>
>
>mitch-GNU RegExp List wrote:
>
>  
>
>>I posted this question a while but got no response, so I'll try once
>>    
>>
>more...
>  
>
>>Is there a way (or plans to develop a way) to discover where in a regular
>>expression that matching failed (i.e. didn't find a match)?  Or
>>    
>>
>alternately,
>  
>
>>but not as useful, where in the regular expression is the last point that
>>successfully matched the input string?
>>
>>Background:  I created a system that uses regular expressions to match
>>against the contents of incoming emails that contain output from various
>>status checks, operational tasks, etc.  When a match fails, it is a time
>>consuming processes to discover where the failure point is.  A index into
>>the regular expression (or input string I guess) that showed where the
>>    
>>
>match
>  
>
>>failed would be very useful and time saving.
>> 
>>
>>    
>>
>
>Hi Mitch -- the short answer is no, there is not currently a way or a 
>plan to implement this.
>
>I'm assuming you're applying this to a situation where you're using 
>isMatch() -- otherwise the logic gets a little ambiguous, because a 
>failed RE will fail at every point along the input.
>
>You could add a method like
>int RE::getLengthMatched(input)
>which would execute similarly to isMatch() but keep the contextual 
>information such that
>RE.isMatch(input) ==> (RE.getLengthMatched(input) == input.length())
>
>Here's some untested off-the-cuff code you can try adding to RE.java:
>
>public int getLengthMatched(Object o, int index, int eflags) {
>    CharIndexed input = makeCharIndexed(o, index);
>    if (firstToken == null)  { return 0; } // Trivial case of empty regexp
>    REMatch m = new REMatch(numSubs, index, eflags);
>    if (firstToken.match(input, m)) {
>     int max = 0;
>    while (m != null) {
>        if (m.index > max) { max = m.index; }
>        m = m.next;
>    }
>    }
>    return max;
>  }
>
>
>  
>



_______________________________________________
Gnu-regexp-users mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/gnu-regexp-users





reply via email to

[Prev in Thread] Current Thread [Next in Thread]