octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: behavior of regexp ( ) function


From: David Bateman
Subject: Re: behavior of regexp ( ) function
Date: Wed, 28 Jan 2009 23:48:41 +0100
User-agent: Mozilla-Thunderbird 2.0.0.17 (X11/20081018)

David Bateman wrote:
Thanks Soren and Benjamin, I believe the attached patch will do the right thing though I'm not sure why PCRE wants to return a zero length match for a pattern like "[^\t]*". Best just to ignore such matches rather than make them abort the search for matches as the code previously did. Patch pushed and attached

D.


This time it really is attached :-)

D.


--
David Bateman                                address@hidden
35 rue Gambetta                              +33 1 46 04 02 18 (Home)
92100 Boulogne-Billancourt FRANCE            +33 6 72 01 06 33 (Mob)

# HG changeset patch
# User David Bateman <address@hidden>
# Date 1233182607 -3600
# Node ID c8c212126d6d8aa02f048ae26c5703ec14b8168b
# Parent  07af3245452d2b62deb68c9ecb73d5c3905ba51d
For zero length matches in regexp, advance index by one and try again

diff --git a/src/ChangeLog b/src/ChangeLog
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,9 @@
+2008-01-28  David Bateman  <address@hidden>
+
+       * DLD-FUNCTIONS/regexp.cc (octregexp_list): Don't break for zero
+       length match, but rather advance the index by one character and
+       try again.
+
 2009-01-28  Jaroslav Hajek  <address@hidden>
 
        * DLD-FUNCTIONS/lookup.cc (Flookup): Fix doc string.
diff --git a/src/DLD-FUNCTIONS/regexp.cc b/src/DLD-FUNCTIONS/regexp.cc
--- a/src/DLD-FUNCTIONS/regexp.cc
+++ b/src/DLD-FUNCTIONS/regexp.cc
@@ -314,7 +314,7 @@
 
                      for (; i < max_length + 1; i++)
                        {
-                         buf <<pattern.substr(new_pos, tmp_pos3 - new_pos)
+                         buf << pattern.substr(new_pos, tmp_pos3 - new_pos)
                              << "{" << i << "}";
                          buf << pattern.substr(tmp_pos3 + 1, 
                                                tmp_pos1 - tmp_pos3 - 1);
@@ -421,7 +421,11 @@
          else if (matches == PCRE_ERROR_NOMATCH)
            break;
          else if (ovector[1] <= ovector[0])
-           break;
+           {
+             // FIXME: Zero sized match!! Is this the right thing to do?
+             idx = ovector[0] + 1;
+             continue;
+           }
          else
            {
              int pos_match = 0;
@@ -515,6 +519,9 @@
              int matches = 0;
              while (matches < subexpr && match[matches].rm_so >= 0) 
                matches++;
+
+             if (matches == 0 || match[0].rm_eo == 0)
+               break;
 
              s = double (match[0].rm_so+1+idx);
              e = double (match[0].rm_eo+idx);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]