[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 miss
From: |
Norihiro Tanaka |
Subject: |
Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match |
Date: |
Fri, 19 Mar 2010 23:03:49 +0900 |
Hi,
I think that it would be better to be corrected as follows. Please point
out if the idea is wrong.
diff -ru grep-2.5.4.183-9159-dirty.orig/src/search.c
grep-2.5.4.183-9159-dirty/src/search.c
--- grep-2.5.4.183-9159-dirty.orig/src/search.c 1970-01-01 00:00:01.000000000
+0000
+++ grep-2.5.4.183-9159-dirty/src/search.c 2010-03-19 09:39:04.000000000
+0000
@@ -418,15 +418,17 @@
match = beg;
while (beg > buf && beg[-1] != eol)
--beg;
- if (kwsm.index < kwset_exact_matches)
- {
#ifdef MBS_SUPPORT
- if (mb_start < beg)
- mb_start = beg;
- if (MB_CUR_MAX == 1 || !is_mb_middle (&mb_start, match,
buflim))
+ if (mb_start < beg)
+ mb_start = beg;
+ if (MB_CUR_MAX > 1 || is_mb_middle (&mb_start, match, buflim))
+ {
+ end = mb_start;
+ continue;
+ }
#endif
- goto success;
- }
+ if (kwsm.index < kwset_exact_matches)
+ goto success;
if (dfaexec (&dfa, beg, (char *) end, 0, NULL, &backref) == NULL)
continue;
}
- [PATCH 13/17] dfa: optimize simple character sets under UTF-8 charsets, (continued)
- [PATCH 13/17] dfa: optimize simple character sets under UTF-8 charsets, Paolo Bonzini, 2010/03/12
- [PATCH 12/17] dfa: speed up handling of brackets, Paolo Bonzini, 2010/03/12
- [PATCH 11/17] dfa: rewrite handling of multibyte case folding, Paolo Bonzini, 2010/03/12
- [PATCH 14/17] dfa: cache MB_CUR_MAX for dfaexec, Paolo Bonzini, 2010/03/12
- [PATCH 15/17] dfa: run simple UTF-8 regexps as a single-byte character set, Paolo Bonzini, 2010/03/12
- [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match, Paolo Bonzini, 2010/03/12
- Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match, Norihiro Tanaka, 2010/03/13
- Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match, Paolo Bonzini, 2010/03/14
- Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match, Norihiro Tanaka, 2010/03/14
- Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match, Paolo Bonzini, 2010/03/15
- Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match,
Norihiro Tanaka <=
[PATCH 17/17] grep: match multibyte charsets line-by-line when using -i, Paolo Bonzini, 2010/03/12
Re: [PATCH 00/16] my last hefty patch drop, Jim Meyering, 2010/03/12
Re: [PATCH 00/16] my last hefty patch drop, Paolo Bonzini, 2010/03/12
Re: [PATCH 00/16] my last hefty patch drop, Aharon Robbins, 2010/03/13