bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch #7134] Patch for is_mb_middle in searchutil.c


From: Jim Meyering
Subject: Re: [patch #7134] Patch for is_mb_middle in searchutil.c
Date: Fri, 26 Mar 2010 21:43:50 +0100

Norihirio Tanaka wrote:
> URL:
>   <http://savannah.gnu.org/patch/?7134>
>
>                  Summary: Patch for is_mb_middle in searchutil.c
>                  Project: grep
>             Submitted by: noritnk
>             Submitted on: 2010年03月26日 13時00分00秒
>                 Category: None
>                 Priority: 5 - Normal
>                   Status: None
>                  Privacy: Public
>              Assigned to: None
>         Originator Email:
>              Open/Closed: Open
>          Discussion Lock: Any
>
> Details:
>
> I seem is_mb_middle has two bugs.
> This patch (merged) will be corrected both their bugs.
>
> 1. fgrep (2.6.1) hangs on a pattern with invalid sequence.
>
> ==> See attachment `fgrep-hang'.
>
> 2. A complete multibyte string matches with incomplate
> multibyte pattern with truncated octet character in fgrep (2.6.1).
>
> ==> See attachment `truncated-character.

Thank you for the patch and test cases!

I've adjusted your test scripts slightly, hooked them
up via tests/Makefile.am and added most of the commit logs.
I confirm that each test exercises a bug that is fixed by the patch.
I'll study your patch over the weekend.
Hmm... or maybe not.  Continue reading...

In future, please consider providing patches in "git format-patch" form,
so it's less work for us.  Here are some guidelines that should help:
(they're technically for coreutils, but apply nearly as well to grep)
  http://git.sv.gnu.org/cgit/coreutils.git/tree/HACKING

I see that there is no copyright assignment on file for you,
and these changes would definitely put you over the limit.
Can you fill out the paperwork mentioned in HACKING, in the
"Copyright assignment" section?

I'm not sure I can wait for that (the process can take weeks),
so may have to work out fixes independently.


>From 08d8a51f54243eb2f43d95d769663757e708f27f Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <address@hidden>
Date: Fri, 26 Mar 2010 16:36:09 +0100
Subject: [PATCH 1/2] add tests for the above

* tests/init.cfg (require_timeout_): New function.
* tests/fgrep-hang: New file.  Test for the above fix.
* tests/truncated-octet: New file.  Likewise.
* tests/Makefile.am (TESTS): Add them.
---
 tests/Makefile.am     |    2 ++
 tests/fgrep-hang      |   23 +++++++++++++++++++++++
 tests/init.cfg        |    6 ++++++
 tests/truncated-octet |   23 +++++++++++++++++++++++
 4 files changed, 54 insertions(+), 0 deletions(-)
 create mode 100644 tests/fgrep-hang
 create mode 100644 tests/truncated-octet

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 8884daf..feeb7a0 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -29,6 +29,7 @@ TESTS =                                               \
   ere.sh                                       \
   euc-mb                                       \
   fedora                                       \
+  fgrep-hang                                   \
   file.sh                                      \
   fmbtest.sh                                   \
   foad1                                                \
@@ -42,6 +43,7 @@ TESTS =                                               \
   spencer1.sh                                  \
   spencer1-locale                              \
   status.sh                                    \
+  truncated-octet                              \
   warning.sh                                   \
   word-multi-file                              \
   yesno.sh
diff --git a/tests/fgrep-hang b/tests/fgrep-hang
new file mode 100644
index 0000000..a81b839
--- /dev/null
+++ b/tests/fgrep-hang
@@ -0,0 +1,23 @@
+#!/bin/sh
+# This would infloop for grep-2.6.1
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+require_timeout_
+
+make_input () {
+  echo "$1" | tr ABC '\357\274\241'
+}
+
+pat=$(make_input BC)
+make_input ABC > exp1 || framework_failure_
+
+fail=0
+
+for LOC in en_US.UTF-8 $LOCALE_FR_UTF8; do
+  out=out1-$LOC
+  LC_ALL=$LOC timeout 10s grep -F "$pat" exp1 > $out || fail=1
+  compare $out exp1 || fail=1
+done
+
+Exit $fail
diff --git a/tests/init.cfg b/tests/init.cfg
index 0ec60f1..6f957b3 100644
--- a/tests/init.cfg
+++ b/tests/init.cfg
@@ -32,3 +32,9 @@ do
 done

 test "$envvar_check_fail" = 1 && fail_ "failed to unset the above envvars"
+
+require_timeout_()
+{
+  ( timeout --version ) > /dev/null 2>&1 \
+    || skip_ your system lacks the timeout program
+}
diff --git a/tests/truncated-octet b/tests/truncated-octet
new file mode 100644
index 0000000..0a32236
--- /dev/null
+++ b/tests/truncated-octet
@@ -0,0 +1,23 @@
+#!/bin/sh
+# This would fail for grep-2.6.1
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+make_input () {
+  echo "$1" | tr ABC '\357\274\241'
+}
+
+pat=`make_input AB`
+make_input ABC > exp1 || framework_failure_
+
+fail=0
+
+for LOC in en_US.UTF-8 $LOCALE_FR_UTF8; do
+  for opt in '' '-F'; do
+    out=out-$opt-$LOC
+    LC_ALL=$LOC grep $opt -v "$pat" exp1 > $out || fail=1
+    compare $out exp1 || fail=1
+  done
+done
+
+Exit $fail
--
1.7.0.3.448.g82eeb


>From 2c0fd7b7607231f20493a8157bdfbb173091d87e Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <address@hidden>
Date: Fri, 26 Mar 2010 16:36:16 +0100
Subject: [PATCH 2/2] fix fgrep infloop and a multi-byte 
erroneous-match-in-middle bug

* src/searchutils.c (is_mb_middle): FIXME
* src/dfasearch.c (EGexecute): Update caller.
* src/kwsearch.c (Fexecute): Likewise.
* src/search.h: Update prototype.
---
 src/dfasearch.c   |    2 +-
 src/kwsearch.c    |    2 +-
 src/search.h      |    2 +-
 src/searchutils.c |   44 +++++++++++++++++++++++++++++++++++---------
 4 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/src/dfasearch.c b/src/dfasearch.c
index 5e7234a..12cc746 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -251,7 +251,7 @@ EGexecute (char const *buf, size_t size, size_t *match_size,
 #ifdef MBS_SUPPORT
                   if (mb_start < beg)
                     mb_start = beg;
-                  if (MB_CUR_MAX == 1 || !is_mb_middle (&mb_start, match, 
buflim))
+                  if (MB_CUR_MAX == 1 || !is_mb_middle (&mb_start, match, 
match + kwsm.size[0], buflim))
 #endif
                     goto success;
                 }
diff --git a/src/kwsearch.c b/src/kwsearch.c
index fa801e6..11788ca 100644
--- a/src/kwsearch.c
+++ b/src/kwsearch.c
@@ -104,7 +104,7 @@ Fexecute (char const *buf, size_t size, size_t *match_size,
       if (offset == (size_t) -1)
        goto failure;
 #ifdef MBS_SUPPORT
-      if (MB_CUR_MAX > 1 && is_mb_middle (&mb_start, beg + offset, buf + size))
+      if (MB_CUR_MAX > 1 && is_mb_middle (&mb_start, beg + offset, beg + 
offset + kwsmatch.size[0], buf + size))
         {
           beg = mb_start - 1;
           continue; /* It is a part of multibyte character.  */
diff --git a/src/search.h b/src/search.h
index 10e4d5c..a287c55 100644
--- a/src/search.h
+++ b/src/search.h
@@ -42,7 +42,7 @@ void kwsinit (kwset_t *);

 #ifdef MBS_SUPPORT
 char * mbtolower (const char *, size_t *);
-bool is_mb_middle(const char **, const char *, const char *);
+bool is_mb_middle(const char **, const char *, const char *, const char *);
 #endif

 /* dfasearch.c */
diff --git a/src/searchutils.c b/src/searchutils.c
index e30355d..6a49db8 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -114,21 +114,38 @@ mbtolower (const char *beg, size_t *n)


 bool
-is_mb_middle(const char **good, const char *buf, const char *end)
+is_mb_middle(const char **good, const char *beg, const char *end, const char 
*lim)
 {
   const char *p = *good;
-  const char *prev = p;
   mbstate_t cur_state;
+  const char *next;

   /* TODO: can be optimized for UTF-8.  */
   memset(&cur_state, 0, sizeof(mbstate_t));
-  while (p < buf)
+  while (p < beg)
     {
-      size_t mbclen = mbrlen(p, end - p, &cur_state);
+      size_t mbclen = mbrlen(p, lim - p, &cur_state);

-      /* Store the beginning of the previous complete multibyte character.  */
-      if (mbclen != (size_t) -2)
-        prev = p;
+      if (mbclen == (size_t) -1 || mbclen == (size_t) -2 || mbclen == 0)
+       {
+         /* An invalid sequence, or a truncated multibyte character.
+            We treat it as a single byte character.  */
+         mbclen = 1;
+         memset(&cur_state, 0, sizeof cur_state);
+       }
+      p += mbclen;
+    }
+
+  *good = p;
+
+  if (p > beg)
+    return true;
+
+  next = NULL;
+
+  while (p < end)
+    {
+      size_t mbclen = mbrlen(p, lim - p, &cur_state);

       if (mbclen == (size_t) -1 || mbclen == (size_t) -2 || mbclen == 0)
        {
@@ -138,9 +155,18 @@ is_mb_middle(const char **good, const char *buf, const 
char *end)
          memset(&cur_state, 0, sizeof cur_state);
        }
       p += mbclen;
+
+     if (next == NULL)
+       next = p;
+    }
+
+  if (p > end)
+    {
+      if (next != NULL)
+        *good = next;
+      return true;
     }

-  *good = prev;
-  return p > buf;
+  return false;
 }
 #endif /* MBS_SUPPORT */
--
1.7.0.3.448.g82eeb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]