bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk regex stuff you may want


From: Paul Eggert
Subject: Re: gawk regex stuff you may want
Date: Mon, 18 Jan 2016 11:07:15 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

Aharon Robbins wrote:

Attached is a diff with bits and pieces (fixes / changes) in gawk's
regex routines that you may wish to apply to the GNULIB version.

Thanks for doing all that. I looked over that diff and installed the attached patches into gnulib; they are taken from your diff.

Here are comments on the parts of the diff not incorporated in this round:

-static const char __re_error_msgid[] =
+static const char __re_error_msgid[] attribute_hidden =

Since the constant is static, there should be no need for attribute_hidden, as the constant is used only in this compilation unit. Similarly for other introductions of attribute_hidden.

-  re_dfa_t *dfa = bufp->buffer;
+  re_dfa_t *dfa = (re_dfa_t *) bufp->buffer;

This cast is not needed, as bufp->buffer is already of the proper type. Similarly elsewhere.

-      preg->buffer = dfa;
+      preg->buffer = (unsigned char *) dfa;

This cast seems counterproductive, as dfa is already of the proper type and the unsigned char * is not the proper type.

-  dfa->subexp_map = re_malloc (Idx, preg->re_nsub);
+  /* some malloc()-checkers don't like zero allocations */
+  if (preg->re_nsub > 0)
+    dfa->subexp_map = re_malloc (int, preg->re_nsub);
+  else
+    dfa->subexp_map = NULL;
+

Since malloc (0) is well-defined to return either NULL or a valid pointer to a zero-byte object that can be freed, the code is working as-is. I'd rather look for a solution that involves silencing the checking without making the code bulkier and typically slower.

+    /*
+     * Fedora Core 2, maybe others, have broken `btowc' that returns -1
+     * for any value > 127. Sigh. Note that `start_ch' and `end_ch' are
+     * unsigned, so we don't have sign extension problems.
+     */
     start_wc = ((start_elem->type == SB_CHAR || start_elem->type == COLL_SYM)
-               ? __btowc (start_ch) : start_elem->opr.wch);
+               ? start_ch : start_elem->opr.wch);
     end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
-             ? __btowc (end_ch) : end_elem->opr.wch);
+             ? end_ch : end_elem->opr.wch);
     if (start_wc == WEOF || end_wc == WEOF)
       return REG_ECOLLATE;

I don't see how this patch is helpful. If btowc returns -1 we are looking at an encoding error, and we can't treat that byte as if it were a character. In some single-byte locales, btowc returns 1 for values < 128, and they're encoding errors too. Perhaps you could mention a use case?

       /* Build single byte matching table for this equivalence class.  */
+      char_buf[1] = (unsigned char) '\0';

This should be unnecessary, as the rest of the code shouldn't be looking at that byte. Is this something to pacify a lint checker?

-       wc = wc2;
+       wc = (wint_t) wc2;

wc and wc2 are both integers so the cast is unnecessary. Similarly elsewhere.

-      pstr->mbs[char_idx] = toupper (ch);
+      if (islower (ch))
+       pstr->mbs[char_idx] = toupper (ch);
+      else
+       pstr->mbs[char_idx] = ch;

toupper is a no-op if ch is not a lower-case character, so the patch should be unnecessary.

+  /*
+   * ADR: valgrind says size can be 0, which then doesn't
+   * free the block of size 0.  Harumph. This seems
+   * to work ok, though.
+   */
+  if (size == 0)
+    {
+       memset(set, 0, sizeof(*set));
+       return REG_NOERROR;
+    }

I assume this is related to the malloc (0) stuff above. Or possibly MALLOC_0_IS_NONNULL is incorrect on your platform? If so, I'd rather fix that bug rather than attempt to work around it.

diff --git a/lib/regex_internal.h b/lib/regex_internal.h
index 0307a34..c634a00 100644
--- a/lib/regex_internal.h
+++ b/lib/regex_internal.h
@@ -117,6 +117,10 @@
 # define BE(expr, val) __builtin_expect (expr, val)
 #else
 # define BE(expr, val) (expr)
+# ifdef inline
+# undef inline
+# endif
+# define inline
 #endif

This should not be necessary, since m4/regex.m4's gl_PREREQ_REGEX requires AC_C_INLINE, which should define 'inline' on the ancient systems that lack it. If that's not working, I'd like to fix the problem at the source rather than work around it here.

Attachment: 0001-regex-fix-memory-leaks.patch
Description: Text Data

Attachment: 0002-regex-fix-diagnostic.patch
Description: Text Data

Attachment: 0003-regex-pacify-static-checkers.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]