[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v2 0/9] UTF-8 speedups
From: |
Paolo Bonzini |
Subject: |
[PATCH v2 0/9] UTF-8 speedups |
Date: |
Sun, 14 Mar 2010 16:35:05 +0100 |
Here is v2 of the patch. It doesn't anymore remove more code than
it adds :-) but it should work also for gawk.
Since the support for case-insensitive multibyte matching involves
some performance penalty (mostly because dfamust rarely finds a good
string) I made it conditional on the GREP symbol. In the future
a scheme for more feature bits can be added, but for now it's good.
Compared to v1, I added Debian's character-set range patch (patch 2)
and fixed the warnings that Jim pointed out.
Paolo Bonzini (9):
tests: add more UTF-8 test cases
dfa: fix handling of ranges in multibyte character sets
dfa: rewrite handling of multibyte case_fold lexing
dfa: speed up handling of brackets
dfa: optimize simple character sets under UTF-8 charsets
dfa: cache MB_CUR_MAX for dfaexec
dfa: run simple UTF-8 regexps as a single-byte character set
grep: remove check_multibyte_string, fix non-UTF8 missed match
grep: match multibyte charsets line-by-line when using -i
.x-sc_cast_of_argument_to_free | 1 -
.x-sc_space_tab | 1 -
NEWS | 15 +-
src/dfa.c | 957 +++++++++++++++++++++-------------------
src/dfa.h | 6 +
src/grep.c | 108 ++---
src/search.c | 244 ++++++-----
tests/Makefile.am | 7 +-
tests/case-fold-backslash-w | 14 +
tests/case-fold-char-range | 21 +
tests/euc-mb | 23 +
tests/foad1.sh | 10 +-
tests/spencer1-locale | 24 +
tests/spencer1-locale.awk | 30 ++
14 files changed, 827 insertions(+), 634 deletions(-)
delete mode 100644 .x-sc_cast_of_argument_to_free
create mode 100755 tests/case-fold-backslash-w
create mode 100644 tests/case-fold-char-range
create mode 100644 tests/euc-mb
create mode 100755 tests/spencer1-locale
create mode 100644 tests/spencer1-locale.awk
- [PATCH v2 0/9] UTF-8 speedups,
Paolo Bonzini <=
[PATCH 3/9] dfa: rewrite handling of multibyte case_fold lexing, Paolo Bonzini, 2010/03/14