bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp regressions


From: Sam Steingold
Subject: Re: regexp regressions
Date: Fri, 26 Aug 2005 09:55:43 -0400
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (windows-nt)

> * Paul Eggert <address@hidden> [2005-08-20 23:01:23 -0700]:
>
> Sam Steingold <address@hidden> writes:
>
>> the latest and greatest gnulib regexp has the following regressions vs
>> the previous (monolithic) version:
>
> Sorry, I didn't understand the notation that you used in
> <http://lists.gnu.org/archive/html/bug-gnulib/2005-08/msg00008.html>.

;; common lisp:
(defun re-test (pattern string)
  (mapcar (lambda (match)
            (and match (regexp:match-string string match)))
          (multiple-value-list
           (regexp:regexp-exec (regexp:regexp-compile pattern :extended t)
                               string))))

this function takes an extended regular expression pattern and a string
and tries to match them, returning a list of substrings of the string
that matched subexpressions of the pattern

Form: (RE-TEST "(^)*" "-")
      ;; pattern = (^)*
      ;; string =  -

CORRECT: ("" "")
         ;; the previous (single-file) version returned two matches: for
         ;; the whole expression and for the first subexpressions, both
         ;; had length 0

CLISP  : ("" NIL)
         ;; the current (multi-file) version returns just one match -
         ;; for the whole expression, no matches for the subexpression

;; this is the explanation of how ("" "") is different from ("" NIL)
Differ at position 1: "" vs NIL
CORRECT: ("")
CLISP  : (NIL)




> I tried to reproduce the problems by writing a C program (enclosed
> below) and it seems to me that the gnulib regexp is correct in all
> these test cases.  Perhaps the old regexp was broken.

frankly I don't know and don't care whether the old or new was / is broken.
All I care about is consistency.
May I suggest that you add regression testing to the parts of gnulib
that exhibit non-trivial functionality, like regex?
Does glibc come with regression tests?
Do those tests cover regex?
Consistency over time - or at least explicitly documented changes -
is quite important (IMNSHO).

Actually, the careful examination of the examples appears to indicate
that the previous behavior was "more" correct.
Specifically, the first 3 of the 6 regressions are clearly bugs in the
current regex implementation while the last 3 are acceptable - but
undesirable - variations.

-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.iris.org.il> <http://www.palestinefacts.org/>
<http://www.jihadwatch.org/> <http://www.openvotingconsortium.org/>
Never succeed from the first try - if you do, nobody will think it was hard.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]