[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 1/2] dfa: process range expressions consistently with system rege
From: |
Paolo Bonzini |
Subject: |
[PATCH 1/2] dfa: process range expressions consistently with system regex |
Date: |
Tue, 21 Sep 2010 17:58:57 +0200 |
The actual meaning of range expressions in glibc is not exactly strcoll,
which makes the behavior of grep hard to predict when compiled with the
system regex. Leave to the system regex matcher the decision of which
single-byte characters are matched by a range expression.
This partially reverts a change made in commit 0d38a8bb (which made
sense at the time, but not now that src/dfa.c is not doing multibyte
character set matching anymore).
* src/dfa.c (in_coll_range): Use system regex to find which single-char
bytes match a range expression.
---
NEWS | 6 ++++++
src/dfa.c | 27 ++++++++++++++++-----------
2 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/NEWS b/NEWS
index 01bbd21..539e978 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,12 @@ GNU grep NEWS -*- outline
-*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** Bug fixes
+
+ grep's interpretation of range expression is now more consistent with
+ that of other tools. [bug present since multi-byte character set
+ support was introduced in 2.5.2, though the steps needed to reproduce
+ it changed in grep-2.6]
* Noteworthy changes in release 2.7 (2010-09-16) [stable]
diff --git a/src/dfa.c b/src/dfa.c
index a2f4174..f3e066f 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -697,13 +697,6 @@ static unsigned char const *buf_end; /* reference to
end in dfaexec(). */
#endif /* MBS_SUPPORT */
-static int
-in_coll_range (char ch, char from, char to)
-{
- char c[6] = { from, 0, ch, 0, to, 0 };
- return strcoll (&c[0], &c[2]) <= 0 && strcoll (&c[2], &c[4]) <= 0;
-}
-
typedef int predicate (int);
/* The following list maps the names of the Posix named character classes
@@ -979,10 +972,22 @@ parse_bracket_exp (void)
for (c = c1; c <= c2; c++)
setbit_case_fold (c, ccl);
else
- for (c = 0; c < NOTCHAR; ++c)
- if (!(case_fold && isupper (c))
- && in_coll_range (c, c1, c2))
- setbit_case_fold (c, ccl);
+ {
+ /* Defer to the system regex library about the meaning
+ of range expressions. */
+ regex_t re;
+ char pattern[6] = { '[', c1, '-', c2, ']', 0 };
+ char subject[2] = { 0, 0 };
+ regcomp (&re, pattern, REG_NOSUB);
+ for (c = 0; c < NOTCHAR; ++c)
+ {
+ subject[0] = c;
+ if (!(case_fold && isupper (c))
+ && regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH)
+ setbit_case_fold (c, ccl);
+ }
+ regfree (&re);
+ }
}
colon_warning_state |= 8;
--
1.7.2.3
- Re: [PATCH 2/2] tests: add testcase for previous fix, (continued)
- Re: [PATCH 2/2] tests: add testcase for previous fix, Paul Eggert, 2010/09/23
- Re: [PATCH 2/2] tests: add testcase for previous fix, Paolo Bonzini, 2010/09/23
- Re: character ranges in regular expressions, Bruno Haible, 2010/09/23
- Re: character ranges in regular expressions, Paolo Bonzini, 2010/09/24
- Re: character ranges in regular expressions, Bruno Haible, 2010/09/24
- Re: character ranges in regular expressions, Paolo Bonzini, 2010/09/24
- Re: character ranges in regular expressions, Bruno Haible, 2010/09/24
- Re: character ranges in regular expressions, Paul Eggert, 2010/09/24
- Re: character ranges in regular expressions, Eric Blake, 2010/09/24
[PATCH 0/2] process range expressions consistently with system regex, Paolo Bonzini, 2010/09/21