bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] dfa: optimize UTF-8 period


From: Jim Meyering
Subject: Re: [PATCH v2] dfa: optimize UTF-8 period
Date: Tue, 20 Apr 2010 15:49:35 +0200

Paolo Bonzini wrote:
> On 04/20/2010 12:06 PM, Jim Meyering wrote:
>>    printf '\n'|LC_ALL=en_US.utf8 src/grep -zl .
>>    printf '\0'|LC_ALL=en_US.utf8 src/grep -l .
>>
>> They should fail.
>
> By the way, I disagree that the first should fail.  With -z the record
> separator ("newline") character is \0, so \n is just like any other
> character.  The second should fail with
>
>   printf '\0'|LC_ALL=en_US.utf8 POSIXLY_CORRECT=1 src/grep -l .

Good points.  Thanks.
Here's a revised test (still failing, of course):

>From 5490e0283796cd4604a86a781644ef87de95526f Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 20 Apr 2010 11:34:57 +0200
Subject: [PATCH] tests: ensure that "." does not match NUL

* tests/dot-vs-NUL-and-NL: New file.
* tests/Makefile.am (TESTS): Add it.
---
 tests/Makefile.am       |    1 +
 tests/dot-vs-NUL-and-NL |   31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 0 deletions(-)
 create mode 100644 tests/dot-vs-NUL-and-NL

diff --git a/tests/Makefile.am b/tests/Makefile.am
index c2cc82c..b81e9ee 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -37,6 +37,7 @@ TESTS =                                               \
   case-fold-char-type                          \
   char-class-multibyte                         \
   dfaexec-multibyte                            \
+  dot-vs-NUL-and-NL                            \
   empty                                                \
   ere.sh                                       \
   euc-mb                                       \
diff --git a/tests/dot-vs-NUL-and-NL b/tests/dot-vs-NUL-and-NL
new file mode 100644
index 0000000..d7927a8
--- /dev/null
+++ b/tests/dot-vs-NUL-and-NL
@@ -0,0 +1,31 @@
+#!/bin/sh
+# Ensure that the match-any "." pattern does not match "\0", and
+# does match "\n" with -z.
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+require_en_utf8_locale_
+
+printf '\n' > nl || framework_failure_
+printf '\0' > nul || framework_failure_
+fail=0
+
+for loc in en_US.UTF-8 C; do
+
+  # "." must not match "\0"
+  LC_ALL=$loc POSIXLY_CORRECT=1 grep -l . nul > out 2>&1
+  # Expect no match and no output.
+  test $? = 1 || fail=1
+  compare out /dev/null || fail=1
+
+  # In general, "." must not match "\n".
+  LC_ALL=$loc grep -l . nl > out
+  test $? = 1 || fail=1
+  compare out /dev/null || fail=1
+
+  # However, "." *does* match "\n" when "\0" is the input record delimiter.
+  LC_ALL=$loc grep -zl . nl > out || fail=1
+
+done
+
+Exit $fail
--
1.7.1.rc2.265.g8743f




reply via email to

[Prev in Thread] Current Thread [Next in Thread]