bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests: exercise two recently-fixed bugs


From: Paul Eggert
Subject: Re: [PATCH] tests: exercise two recently-fixed bugs
Date: Fri, 16 Mar 2012 14:21:30 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1

>> > I think we should give a "repeat count too large" error here.  The regex
>> > is being compiled to a different meaning than what the user intended.
> Yes, definitely.
> In testing, I noticed that we diagnose the range error
> in the regexp path, but not in the DFA one, but haven't
> gotten around to it.  The DFA diagnostic should be the same
> as the regexp one.

Why?  This isn't true of the other DFA diagnostics in 'grep'.

> A patch would be most welcome.

It turns out that the range error isn't diagnosed properly in
the regexp path either.  I pushed the following gnulib patch to
fix this.  I'll propose a grep patch shortly; it'll assume a
sync from gnulib.

---
 ChangeLog     |   15 +++++++++++++++
 lib/regcomp.c |   12 ++++++++++--
 lib/regex.h   |    4 ++--
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 54e3b5d..128acda 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,20 @@
 2012-03-16  Paul Eggert  <address@hidden>
 
+       regex: diagnose too-large repeat counts in EREs
+       Previously, the code did not diagnose the too-large repeat count
+       in EREs like 'b{1000000000}'; instead, it silently treated the ERE
+       as if it were 'b\{1000000000}', which is unexpected.
+       * lib/regcomp.c (parse_dup_op): Fail with REG_ESIZE if a repeat count
+       is too large.  REG_ESIZE is used nowhere else, and the diagnostic
+       is a reasonable one for this problem.  Another option would be to
+       create a new REG_OVERFLOW error for repeat counts that are too large.
+       (fetch_number): Return RE_DUP_MAX + 1, not REG_ERROR, if the repeat
+       count is too large, so that the caller can distinguish the two cases.
+       * lib/regex.h (_REG_ESIZE): Document that this is now a generic
+       "Too large" return code, and that repeat counts are one example of this.
+
+2012-03-16  Paul Eggert  <address@hidden>
+
        doc: some glibc x32 integer width issues
        * doc/posix-headers/sys_types.texi (sys/types.h):
        * doc/posix-headers/time.texi (time.h):
diff --git a/lib/regcomp.c b/lib/regcomp.c
index e6d9c99..3841a0a 100644
--- a/lib/regcomp.c
+++ b/lib/regcomp.c
@@ -2571,6 +2571,12 @@ parse_dup_op (bin_tree_t *elem, re_string_t *regexp, 
re_dfa_t *dfa,
          *err = REG_BADBR;
          return NULL;
        }
+
+      if (BE (RE_DUP_MAX < (end == REG_MISSING ? start : end), 0))
+       {
+         *err = REG_ESIZE;
+         return NULL;
+       }
     }
   else
     {
@@ -3751,6 +3757,7 @@ build_charclass_op (re_dfa_t *dfa, RE_TRANSLATE_TYPE 
trans,
 /* This is intended for the expressions like "a{1,3}".
    Fetch a number from 'input', and return the number.
    Return REG_MISSING if the number field is empty like "{,1}".
+   Return RE_DUP_MAX + 1 if the number field is too large.
    Return REG_ERROR if an error occurred.  */
 
 static Idx
@@ -3769,8 +3776,9 @@ fetch_number (re_string_t *input, re_token_t *token, 
reg_syntax_t syntax)
       num = ((token->type != CHARACTER || c < '0' || '9' < c
              || num == REG_ERROR)
             ? REG_ERROR
-            : ((num == REG_MISSING) ? c - '0' : num * 10 + c - '0'));
-      num = (num > RE_DUP_MAX) ? REG_ERROR : num;
+            : num == REG_MISSING
+            ? c - '0'
+            : MIN (RE_DUP_MAX + 1, num * 10 + c - '0'));
     }
   return num;
 }
diff --git a/lib/regex.h b/lib/regex.h
index 0c3b420..c1cd613 100644
--- a/lib/regex.h
+++ b/lib/regex.h
@@ -304,7 +304,7 @@ extern reg_syntax_t re_syntax_options;
 /* RE_DUP_MAX is 2**15 - 1 because an earlier implementation stored
    the counter as a 2-byte signed integer.  This is no longer true, so
    RE_DUP_MAX could be increased to (INT_MAX / 10 - 1), or to
-   ((SIZE_MAX - 2) / 10 - 1) if _REGEX_LARGE_OFFSETS is defined.
+   ((SIZE_MAX - 9) / 10) if _REGEX_LARGE_OFFSETS is defined.
    However, there would be a huge performance problem if someone
    actually used a pattern like a\{214748363\}, so RE_DUP_MAX retains
    its historical value.  */
@@ -375,7 +375,7 @@ typedef enum
 
   /* Error codes we've added.  */
   _REG_EEND,           /* Premature end.  */
-  _REG_ESIZE,          /* Compiled pattern bigger than 2^16 bytes.  */
+  _REG_ESIZE,          /* Too large (e.g., repeat count too large).  */
   _REG_ERPAREN         /* Unmatched ) or \); not returned from regcomp.  */
 } reg_errcode_t;
 
-- 
1.7.6.5




reply via email to

[Prev in Thread] Current Thread [Next in Thread]