bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23752: [PATCH] grep: try fgrep matcher for case insensitive matching


From: Norihiro Tanaka
Subject: bug#23752: [PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale
Date: Sun, 12 Jun 2016 18:47:58 +0900

In grep 2.19 or later, grep -F use grep matcher for case insensitive
matching in multibyte locale.  However, it causes poor performance for a
long pattern bacause of building DFA.

By this patch, in multibyte locale, if a pattern is composed of only
single byte characters and their all counterparts are also single byte
characters and the pattern does not have invalid sequences, grep -F uses
fgrep matcher same as single byte locale.

It fixes bug#21763 and bug#22239 partially.

$ seq -f '%g bottles of beer on the wall' 1 600 >pat
$ tr a-z A-Z <pat >in

(before)
$ time -p env LC_ALL=C src/grep -Fivf pat in
real 0.08
user 0.03
sys 0.03
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 104.84
user 93.32
sys 3.28

(after)
$ time -p env LC_ALL=C src/grep -Fivf pat in
real 0.09
user 0.03
sys 0.04
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 0.08
user 0.03
sys 0.03

If a pattern has any multibyte character, grep -F is still slow.

$ printf '\xb3\xa4\n' >>pat
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 103.38
user 93.81
sys 2.46

Attachment: 0001-grep-try-fgrep-matcher-for-case-insensitive-matching.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]