grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v2.18-141-ga2fc69b


From: Paul Eggert
Subject: grep branch, master, updated. v2.18-141-ga2fc69b
Date: Sat, 10 May 2014 23:27:15 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  a2fc69bc0e5f12ddee151f1f695d9c1a393b8afd (commit)
      from  6f079006b832ae2be56c915f8ca9b5ea5ede6bf9 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=a2fc69bc0e5f12ddee151f1f695d9c1a393b8afd


commit a2fc69bc0e5f12ddee151f1f695d9c1a393b8afd
Author: Paul Eggert <address@hidden>
Date:   Sat May 10 16:26:21 2014 -0700

    dfa: fix bug with \< etc in multibyte locales
    
    Problem reported by Stephane Chazelas in: http://bugs.gnu.org/16867
    * NEWS: Document the fix.
    * src/dfa.c (dfaoptimize): Remove any superset if changing from
    UTF-8 to unibyte, and if the pattern has no backreferences.
    (dfassbuild): In multibyte locales, treat \< \> \b \B as
    backreferences in the DFA, since the DFA relies on unibyte
    tests to check them.
    (dfacomp): Optimize after building the superset, so that
    dfassbuild can depend on d->multibyte.  A downside is that
    dfaoptimize must remove supersets that are likely slower than the
    DFA after optimization, but that's been done in the
    above-described change.
    * tests/Makefile.am (XFAIL_TESTS): Remove word-delim-multibyte,
    since the test works now.

diff --git a/NEWS b/NEWS
index 685ce9b..64539c0 100644
--- a/NEWS
+++ b/NEWS
@@ -25,6 +25,8 @@ GNU grep NEWS                                    -*- outline 
-*-
 
   grep -w no longer mishandles a potential match adjacent to a letter that
   takes up two or more bytes in a multibyte encoding.
+  Similarly, the patterns '\<', '\>', '\b', and '\B' no longer
+  mishandle word-boundary matches in multibyte locales.
   [bug present since "the beginning"]
 
   grep -P now reports an error and exits when given invalid UTF-8 data.
diff --git a/src/dfa.c b/src/dfa.c
index 0a221f7..ba19a72 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3484,6 +3484,7 @@ static void
 dfaoptimize (struct dfa *d)
 {
   size_t i;
+  bool have_backref = false;
 
   if (!using_utf8 ())
     return;
@@ -3495,6 +3496,9 @@ dfaoptimize (struct dfa *d)
         case ANYCHAR:
           /* Lowered.  */
           abort ();
+        case BACKREF:
+          have_backref = true;
+          break;
         case MBCSET:
           /* Requires multi-byte algorithm.  */
           return;
@@ -3503,6 +3507,14 @@ dfaoptimize (struct dfa *d)
         }
     }
 
+  if (!have_backref && d->superset)
+    {
+      /* The superset DFA is not likely to be much faster, so remove it.  */
+      dfafree (d->superset);
+      free (d->superset);
+      d->superset = NULL;
+    }
+
   free_mbdata (d);
   d->multibyte = false;
 }
@@ -3560,8 +3572,11 @@ dfassbuild (struct dfa *d)
         case NOTLIMWORD:
           if (d->multibyte)
             {
-              /* Ignore these constraints.  */
+              /* These constraints aren't supported in a multibyte locale.
+                 Ignore them in the superset DFA, and treat them as
+                 backreferences in the main DFA.  */
               sup->tokens[j++] = EMPTY;
+              d->tokens[i] = BACKREF;
               break;
             }
         default:
@@ -3591,8 +3606,8 @@ dfacomp (char const *s, size_t len, struct dfa *d, int 
searchflag)
   dfambcache (d);
   dfaparse (s, len, d);
   dfamust (d);
-  dfaoptimize (d);
   dfassbuild (d);
+  dfaoptimize (d);
   dfaanalyze (d, searchflag);
   if (d->superset)
     {
diff --git a/tests/Makefile.am b/tests/Makefile.am
index f3450f3..626b25a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -22,9 +22,7 @@ AM_CFLAGS = $(WARN_CFLAGS) $(WERROR_CFLAGS)
 AM_LDFLAGS = $(IGNORE_UNUSED_LIBRARIES_CFLAGS)
 LDADD = ../lib/libgreputils.a $(LIBINTL) ../lib/libgreputils.a
 
-# Remove this definition once the failing test passes.
-XFAIL_TESTS = \
-  word-delim-multibyte
+XFAIL_TESTS =
 
 # Equivalence classes are only supported when using the system
 # matcher (which means only with glibc).

-----------------------------------------------------------------------

Summary of changes:
 NEWS              |    2 ++
 src/dfa.c         |   19 +++++++++++++++++--
 tests/Makefile.am |    4 +---
 3 files changed, 20 insertions(+), 5 deletions(-)


hooks/post-receive
-- 
grep



reply via email to

[Prev in Thread] Current Thread [Next in Thread]