bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 1/2] pcresearch: set UTF-8 flag correctly for UTF-8 locales


From: Paolo Bonzini
Subject: [PATCH 1/2] pcresearch: set UTF-8 flag correctly for UTF-8 locales
Date: Wed, 3 Oct 2012 11:20:29 +0200

From: Petr Pisar <address@hidden>

Otherwise, Unicode properties (\p{XXX}) do not work with characters
outside the 7-bit ASCII character set.

* src/pcresearch.c (Pcompile): Look for UTF-8 locales and set PCRE_UTF8
if one is found.
---
 NEWS             | 6 ++++++
 src/pcresearch.c | 8 ++++++++
 2 file modificati, 14 inserzioni(+)

diff --git a/NEWS b/NEWS
index 9309f62..bc669b9 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU grep NEWS                                    -*- outline 
-*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Bug fixes
+
+  While multi-byte mode is only supported by PCRE with UTF-8 locales,
+  grep did not activate it.  This can cause failures to match multibyte
+  characters against some regular expressions, especially those including
+  the '.' or '\p' metacharacters.
 
 * Noteworthy changes in release 2.14 (2012-08-20) [stable]
 
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 2994e65..3539b58 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -25,6 +25,9 @@
 #elif HAVE_PCRE_PCRE_H
 # include <pcre/pcre.h>
 #endif
+#if HAVE_LANGINFO_CODESET
+# include <langinfo.h>
+#endif
 
 #if HAVE_LIBPCRE
 /* Compiled internal form of a Perl regular expression.  */
@@ -51,6 +54,11 @@ Pcompile (char const *pattern, size_t size)
   char const *p;
   char const *pnul;
 
+#if defined HAVE_LANGINFO_CODESET
+  if (!strcmp(nl_langinfo(CODESET), "UTF-8"))
+    flags |= PCRE_UTF8;
+#endif
+
   /* FIXME: Remove these restrictions.  */
   if (memchr(pattern, '\n', size))
     error (EXIT_TROUBLE, 0, _("the -P option only supports a single pattern"));
-- 
1.7.12.1





reply via email to

[Prev in Thread] Current Thread [Next in Thread]