[debbugs-tracker] bug#23752: closed ([PATCH] grep: try fgrep matcher for

emacs-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#23752: closed ([PATCH] grep: try fgrep matcher for

From:	GNU bug Tracking System
Subject:	[debbugs-tracker] bug#23752: closed ([PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale)
Date:	Thu, 01 Sep 2016 16:51:02 +0000

Your message dated Thu, 1 Sep 2016 09:50:11 -0700
with message-id <address@hidden>
and subject line Re: bug#23752: [PATCH] grep: try fgrep matcher for case 
insensitive matching by grep -F in multibyte locale
has caused the debbugs.gnu.org bug report #23752,
regarding [PATCH] grep: try fgrep matcher for case insensitive matching by grep 
-F in multibyte locale
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
23752: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=23752
GNU Bug Tracking System
Contact address@hidden with problems

--- Begin Message --- Subject: [PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale Date: Sun, 12 Jun 2016 18:47:58 +0900

In grep 2.19 or later, grep -F use grep matcher for case insensitive
matching in multibyte locale.  However, it causes poor performance for a
long pattern bacause of building DFA.

By this patch, in multibyte locale, if a pattern is composed of only
single byte characters and their all counterparts are also single byte
characters and the pattern does not have invalid sequences, grep -F uses
fgrep matcher same as single byte locale.

It fixes bug#21763 and bug#22239 partially.

$ seq -f '%g bottles of beer on the wall' 1 600 >pat
$ tr a-z A-Z <pat >in

(before)
$ time -p env LC_ALL=C src/grep -Fivf pat in
real 0.08
user 0.03
sys 0.03
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 104.84
user 93.32
sys 3.28

(after)
$ time -p env LC_ALL=C src/grep -Fivf pat in
real 0.09
user 0.03
sys 0.04
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 0.08
user 0.03
sys 0.03

If a pattern has any multibyte character, grep -F is still slow.

$ printf '\xb3\xa4\n' >>pat
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 103.38
user 93.81
sys 2.46

0001-grep-try-fgrep-matcher-for-case-insensitive-matching.patch
Description: Text document

--- End Message ---

--- Begin Message --- Subject: Re: bug#23752: [PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale Date: Thu, 1 Sep 2016 09:50:11 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 Thanks for that performance improvement. I rebased the patch (1st attachment)and wrote some followup changes (2nd attachment) and installed them into theSavannah master.
If a pattern has any multibyte character, grep -F is still slow.
Suppose all the multibyte characters in the pattern are non-letters, so thatcase-folding does not affect them. Could grep -iF be fast in that case?
Is the problem that some encodings allow two different representations for thesame character, and we want the pattern to match both representations?
0001-grep-speed-up-iF-in-multibyte-locales.txt
Description: Text document

0002-grep-avoid-code-duplication-with-iF.txt
Description: Text document

--- End Message ---

[Prev in Thread]

Current Thread

[Next in Thread]

[debbugs-tracker] bug#23752: closed ([PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale), GNU bug Tracking System <=

Prev by Date: [debbugs-tracker] bug#24349: closed (Linux info date example)
Next by Date: [debbugs-tracker] bug#23932: closed (dfa: use algorithm for single byte character to any single byte character in input text always)
Previous by thread: [debbugs-tracker] bug#24349: closed (Linux info date example)
Next by thread: [debbugs-tracker] bug#23932: closed (dfa: use algorithm for single byte character to any single byte character in input text always)
Index(es):
- Date
- Thread