[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/2] Set PCRE_UTF8 flag correctly for UTF-8 locales
From: |
Jim Meyering |
Subject: |
Re: [PATCH 0/2] Set PCRE_UTF8 flag correctly for UTF-8 locales |
Date: |
Wed, 03 Oct 2012 12:10:31 +0200 |
Paolo Bonzini wrote:
> This is the patch attached to https://bugzilla.redhat.com/683753
> and http://savannah.gnu.org/patch/?3934, with testcases.
>
> Paolo
>
> Paolo Bonzini (1):
> tests: include UTF-8 testcases for grep -P
>
> Petr Pisar (1):
> pcresearch: set UTF-8 flag correctly for UTF-8 locales
>
> NEWS | 6 ++++++
> src/pcresearch.c | 8 ++++++++
> tests/Makefile.am | 1 +
> tests/pcre-utf8 | 33 +++++++++++++++++++++++++++++++++
> 4 file modificati, 48 inserzioni(+)
> create mode 100755 tests/pcre-utf8
Thanks for the quick work, Paolo.
I will push this follow-on patch shortly, along with one more
to factor out the now-duplicate STREQ definition.
>From 9df414a75f101a1f7f25c5850d5cfc2e242f6ff8 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Wed, 3 Oct 2012 12:08:31 +0200
Subject: [PATCH] maint: correct syntax-check failures; adjust NEWS
* tests/pcre-utf8: Reverse order of compare arguments.
Remove all copyright year numbers except 2012.
Use skip_ "diagnostic...", rather than a bare "exit 77".
* NEWS: Start with a concise description of the bug.
* src/pcresearch.c (STREQ): Define, so that we can...
(Pcompile): use STREQ, not strcmp.
---
NEWS | 9 +++++----
src/pcresearch.c | 4 +++-
tests/pcre-utf8 | 13 +++++++------
3 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/NEWS b/NEWS
index bc669b9..052cd81 100644
--- a/NEWS
+++ b/NEWS
@@ -4,10 +4,11 @@ GNU grep NEWS -*- outline
-*-
** Bug fixes
- While multi-byte mode is only supported by PCRE with UTF-8 locales,
- grep did not activate it. This can cause failures to match multibyte
- characters against some regular expressions, especially those including
- the '.' or '\p' metacharacters.
+ grep -P could misbehave. While multi-byte mode is only supported by PCRE
+ with UTF-8 locales, grep did not activate it. This would cause failures
+ to match multibyte characters against some regular expressions, especially
+ those including the '.' or '\p' metacharacters.
+
* Noteworthy changes in release 2.14 (2012-08-20) [stable]
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 3539b58..a15f598 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -29,6 +29,8 @@
# include <langinfo.h>
#endif
+#define STREQ(a, b) (strcmp (a, b) == 0)
+
#if HAVE_LIBPCRE
/* Compiled internal form of a Perl regular expression. */
static pcre *cre;
@@ -55,7 +57,7 @@ Pcompile (char const *pattern, size_t size)
char const *pnul;
#if defined HAVE_LANGINFO_CODESET
- if (!strcmp(nl_langinfo(CODESET), "UTF-8"))
+ if (STREQ (nl_langinfo (CODESET), "UTF-8"))
flags |= PCRE_UTF8;
#endif
diff --git a/tests/pcre-utf8 b/tests/pcre-utf8
index b86b114..04146ec 100755
--- a/tests/pcre-utf8
+++ b/tests/pcre-utf8
@@ -1,7 +1,7 @@
#! /bin/sh
# Ensure that, with -P, Unicode \p{} symbols are correctly matched.
#
-# Copyright (C) 2001, 2006, 2009-2012 Free Software Foundation, Inc.
+# Copyright (C) 2012 Free Software Foundation, Inc.
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
@@ -13,21 +13,22 @@ require_en_utf8_locale_
fail=0
-echo '$' | LC_ALL=en_US.UTF-8 grep -qP '\p{S}' || exit 77
+echo '$' | LC_ALL=en_US.UTF-8 grep -qP '\p{S}' \
+ || skip_ 'PCRE support is compiled out'
euro='\xe2\x82\xac euro'
printf "$euro\\n" > in || framework_failure_
LC_ALL=en_US.UTF-8 grep -P '^\p{S}' in > out || fail=1
-compare out in || fail=1
+compare in out || fail=1
LC_ALL=en_US.UTF-8 grep -P '^. euro$' in > out2 || fail=1
-compare out2 in || fail=1
+compare in out2 || fail=1
LC_ALL=en_US.UTF-8 grep -oP '. euro' in > out3 || fail=1
-compare out3 in || fail=1
+compare in out3 || fail=1
LC_ALL=en_US.UTF-8 grep -P '^\P{S}' in > out4
-compare out4 /dev/null || fail=1
+compare /dev/null out4 || fail=1
Exit $fail
--
1.7.12.1.382.gb0576a6