bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: please keep 'memmem' module simple


From: Eric Blake
Subject: Re: please keep 'memmem' module simple
Date: Wed, 9 Jan 2008 17:46:51 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Simon Josefsson <simon <at> josefsson.org> writes:

> >     return !result || !memmem ("a", 1, 0, 0);]])],
> >
> Ah, thanks.  This was not part of the old memmem.m4 checks.  Still, the
> glibc documentation (which I regard as the memmem specification) doesn't
> say what should be returned for empty needles.  My conclusion is that
> empty needles should lead to undefined behaviour.

Fair enough for the memmem-simple module.  But since POSIX requires that 0-
length needles in strstr(ptr,"") return ptr, and I consider that the similarity 
between memmem and strstr is essential, (in part so someone could do a naive 
implementation of strstr as:
 return memmem (haystack, strlen (haystack), needle, strlen (needle));
and in part because if the haystack length is known in advance, memmem can be 
more efficient than strstr).  Therefore, the full memmem module should enforce 
an empty needle as well-defined behavior.  (Or, put another way, I 
want 'echo "index(abc,)"|m4' to work without me having to special case whether 
the second argument to index is empty).

> Unfortunately, cross-compilation is in practice the typical scenario
> where people care about size issues, so I would prefer to rely on
> AC_CHECK_FUNC instead.

OK then, based on this email, the gl_FUNC_MEMMEM_SIMPLE macro uses 
AC_CHECK_FUNC when cross-compiling, while gl_FUNC_MEMMEM is pessimistic and 
always uses the replacement (unless you did primed the cache with 
gl_cv_func_memmem_works=yes first).  Clients of memmem-simple must avoid empty 
needles, and should avoid arbitrary needles, and as a result get smaller code 
size on glibc and cygwin.

Here's what I'm committing (I reversed your proposed macro dependency - the 
full module should depend on and augment the simple check, not the other way 
around):

From: Eric Blake <address@hidden>
Date: Wed, 9 Jan 2008 10:19:13 -0700
Subject: [PATCH] Add memmem-simple module.

* m4/memmem.m4 (gl_FUNC_MEMMEM_SIMPLE): New macro.
(gl_FUNC_MEMMEM): Separate performance from presence checks.
* modules/memmem-simple: New file.
* modules/memmem (Description): Tweak.
* MODULES.html.sh (string handling): Mention it.
* doc/functions/memmem.texi (memmem): Distinguish which flaws are
addressed by memmem-simple.
* NEWS: Document the difference.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog                 |   15 ++++++++++++++-
 MODULES.html.sh           |    3 ++-
 NEWS                      |    6 ++++++
 doc/functions/memmem.texi |    6 +++---
 m4/memmem.m4              |   20 +++++++++++++++-----
 modules/memmem            |    2 +-
 modules/memmem-simple     |   28 ++++++++++++++++++++++++++++
 7 files changed, 69 insertions(+), 11 deletions(-)
 create mode 100644 modules/memmem-simple

diff --git a/ChangeLog b/ChangeLog
index 975665d..721b9e2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@
+2008-01-09  Simon Josefsson  <address@hidden>
+       and Eric Blake  <address@hidden>
+
+       Add memmem-simple module.
+       * m4/memmem.m4 (gl_FUNC_MEMMEM_SIMPLE): New macro.
+       (gl_FUNC_MEMMEM): Separate performance from presence checks.
+       * modules/memmem-simple: New file.
+       * modules/memmem (Description): Tweak.
+       * MODULES.html.sh (string handling): Mention new module.
+       * doc/functions/memmem.texi (memmem): Distinguish which flaws are
+       addressed by memmem-simple.
+       * NEWS: Document the difference.
+
 2008-01-09  Eric Blake  <address@hidden>
 
        Give gcc some memmem optimization hints.
diff --git a/MODULES.html.sh b/MODULES.html.sh
index 324b3f3..66e0b01 100755
--- a/MODULES.html.sh
+++ b/MODULES.html.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 #
-# Copyright (C) 2002-2007 Free Software Foundation, Inc.
+# Copyright (C) 2002-2008 Free Software Foundation, Inc.
 #
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -1654,6 +1654,7 @@ func_all_modules ()
   func_begin_table
   func_module bcopy
   func_module memmem
+  func_module memmem-simple
   func_module mempcpy
   func_module memrchr
   func_module stpcpy
diff --git a/NEWS b/NEWS
index c30b272..df2676f 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,12 @@ User visible incompatible changes
 
 Date        Modules         Changes
 
+2008-01-08  memmem          This module now replaces worst-case inefficient
+                            implementations; clients that use controlled
+                            needles and thus do not care about worst-case
+                            efficiency should use the new memmem-simple
+                            module instead for smaller code size.
+
 2007-12-24  setenv          The include file is changed from "setenv.h" to
                             <stdlib.h>. Also, the unsetenv function is no
                             longer declared in this module; use the 'unsetenv'
diff --git a/doc/functions/memmem.texi b/doc/functions/memmem.texi
index fefbe3c..4925ac8 100644
--- a/doc/functions/memmem.texi
+++ b/doc/functions/memmem.texi
@@ -4,7 +4,7 @@
 
 Unspecified by POSIX, but comparable to @code{strstr}.
 
-Gnulib module: memmem
+Gnulib module: memmem, memmem-simple
 
 Portability problems fixed by Gnulib:
 @itemize
@@ -14,12 +14,12 @@ Linux libc 5.0.9
 
 @item
 This function returns incorrect values in some cases, such as when
-given an empty needle:
+given an empty needle (not fixed in memmem-simple):
 glibc <= 2.0, cygwin 1.5.x
 
 @item
 This function has quadratic instead of linear complexity on some
-platforms:
+platforms (not fixed in memmem-simple):
 glibc <= 2.6.1, cygwin 1.5.x
 
 @item
diff --git a/m4/memmem.m4 b/m4/memmem.m4
index 9767354..d862027 100644
--- a/m4/memmem.m4
+++ b/m4/memmem.m4
@@ -1,10 +1,11 @@
-# memmem.m4 serial 7
+# memmem.m4 serial 8
 dnl Copyright (C) 2002, 2003, 2004, 2007, 2008 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
 dnl with or without modifications, as long as this notice is preserved.
 
-AC_DEFUN([gl_FUNC_MEMMEM],
+dnl Check that memmem is present.
+AC_DEFUN([gl_FUNC_MEMMEM_SIMPLE],
 [
   dnl Persuade glibc <string.h> to declare memmem().
   AC_REQUIRE([AC_USE_SYSTEM_EXTENSIONS])
@@ -14,7 +15,15 @@ AC_DEFUN([gl_FUNC_MEMMEM],
   AC_CHECK_DECLS_ONCE(memmem)
   if test $ac_cv_have_decl_memmem = no; then
     HAVE_DECL_MEMMEM=0
-  else
+  fi
+  gl_PREREQ_MEMMEM
+]) # gl_FUNC_MEMMEM_SIMPLE
+
+dnl Additionally, check that memmem is efficient and handles empty needles.
+AC_DEFUN([gl_FUNC_MEMMEM],
+[
+  AC_REQUIRE([gl_FUNC_MEMMEM_SIMPLE])
+  if test $ac_cv_have_decl_memmem = yes; then
     AC_CACHE_CHECK([whether memmem works in linear time],
       [gl_cv_func_memmem_works],
       [AC_RUN_IFELSE([AC_LANG_PROGRAM([
@@ -28,6 +37,7 @@ AC_DEFUN([gl_FUNC_MEMMEM],
     /* Failure to compile this test due to missing alarm is okay,
        since all such platforms (mingw) also lack memmem.  */
     alarm (5);
+    /* Check for quadratic performance.  */
     if (haystack && needle)
       {
        memset (haystack, 'A', 2 * m);
@@ -36,6 +46,7 @@ AC_DEFUN([gl_FUNC_MEMMEM],
        needle[m] = 'B';
        result = memmem (haystack, 2 * m + 1, needle, m + 1);
       }
+    /* Check for empty needle behavior.  */
     return !result || !memmem ("a", 1, 0, 0);]])],
        [gl_cv_func_memmem_works=yes], [gl_cv_func_memmem_works=no],
        [dnl pessimistically assume the worst, since even glibc 2.6.1
@@ -46,8 +57,7 @@ AC_DEFUN([gl_FUNC_MEMMEM],
       AC_LIBOBJ([memmem])
     fi
   fi
-  gl_PREREQ_MEMMEM
-])
+]) # gl_FUNC_MEMMEM
 
 # Prerequisites of lib/memmem.c.
 AC_DEFUN([gl_PREREQ_MEMMEM], [:])
diff --git a/modules/memmem b/modules/memmem
index 71f770f..28e386d 100644
--- a/modules/memmem
+++ b/modules/memmem
@@ -1,5 +1,5 @@
 Description:
-memmem() function: locate first substring in a buffer.
+memmem() function: efficiently locate first substring in a buffer.
 
 Files:
 lib/memmem.c
diff --git a/modules/memmem-simple b/modules/memmem-simple
new file mode 100644
index 0000000..bbe89cd
--- /dev/null
+++ b/modules/memmem-simple
@@ -0,0 +1,28 @@
+Description:
+memmem() function: locate first substring in a buffer.
+
+Files:
+lib/memmem.c
+m4/memmem.m4
+
+Depends-on:
+extensions
+string
+stdint
+memchr
+memcmp
+
+configure.ac:
+gl_FUNC_MEMMEM_SIMPLE
+gl_STRING_MODULE_INDICATOR([memmem])
+
+Makefile.am:
+
+Include:
+<string.h>
+
+License:
+LGPLv2+
+
+Maintainer:
+libc, Eric Blake, Simon Josefsson
-- 
1.5.3.5








reply via email to

[Prev in Thread] Current Thread [Next in Thread]