emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#23752: closed ([PATCH] grep: try fgrep matcher for


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#23752: closed ([PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale)
Date: Thu, 01 Sep 2016 16:51:02 +0000

Your message dated Thu, 1 Sep 2016 09:50:11 -0700
with message-id <address@hidden>
and subject line Re: bug#23752: [PATCH] grep: try fgrep matcher for case 
insensitive matching by grep -F in multibyte locale
has caused the debbugs.gnu.org bug report #23752,
regarding [PATCH] grep: try fgrep matcher for case insensitive matching by grep 
-F in multibyte locale
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
23752: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=23752
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: [PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale Date: Sun, 12 Jun 2016 18:47:58 +0900
In grep 2.19 or later, grep -F use grep matcher for case insensitive
matching in multibyte locale.  However, it causes poor performance for a
long pattern bacause of building DFA.

By this patch, in multibyte locale, if a pattern is composed of only
single byte characters and their all counterparts are also single byte
characters and the pattern does not have invalid sequences, grep -F uses
fgrep matcher same as single byte locale.

It fixes bug#21763 and bug#22239 partially.

$ seq -f '%g bottles of beer on the wall' 1 600 >pat
$ tr a-z A-Z <pat >in

(before)
$ time -p env LC_ALL=C src/grep -Fivf pat in
real 0.08
user 0.03
sys 0.03
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 104.84
user 93.32
sys 3.28

(after)
$ time -p env LC_ALL=C src/grep -Fivf pat in
real 0.09
user 0.03
sys 0.04
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 0.08
user 0.03
sys 0.03

If a pattern has any multibyte character, grep -F is still slow.

$ printf '\xb3\xa4\n' >>pat
$ time -p env LC_ALL=ja_JP.eucjp src/grep -Fivf pat in
real 103.38
user 93.81
sys 2.46

Attachment: 0001-grep-try-fgrep-matcher-for-case-insensitive-matching.patch
Description: Text document


--- End Message ---
--- Begin Message --- Subject: Re: bug#23752: [PATCH] grep: try fgrep matcher for case insensitive matching by grep -F in multibyte locale Date: Thu, 1 Sep 2016 09:50:11 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 Thanks for that performance improvement. I rebased the patch (1st attachment) and wrote some followup changes (2nd attachment) and installed them into the Savannah master.

If a pattern has any multibyte character, grep -F is still slow.

Suppose all the multibyte characters in the pattern are non-letters, so that case-folding does not affect them. Could grep -iF be fast in that case?

Is the problem that some encodings allow two different representations for the same character, and we want the pattern to match both representations?

Attachment: 0001-grep-speed-up-iF-in-multibyte-locales.txt
Description: Text document

Attachment: 0002-grep-avoid-code-duplication-with-iF.txt
Description: Text document


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]