--- Begin Message ---
Subject: |
[PATCH] grep: improve performance for grep -P in UTF-8 |
Date: |
Mon, 07 Dec 2015 08:01:23 +0900 |
After grep -P found first match, TEXTBIN_UNKNOWN optimizations is not
used. Therefore, if grep -P found early match, grep -P is very slow in
UTF-8.
$ time -p grep -P ^1$ <(seq 999999)
1
real 14.55
user 13.77
sys 1.12
Or grep -Pa is not used TEXTBIN_UNKNOWN optimizations. Therefere, it is
also very slow in UTF-8.
grep -P ^1$ <(seq 999999)
$ time -p grep -Pa a <(seq 999999)
real 14.53
user 13.65
sys 1.35
This change makes deference to leave TEXTBIN_UNKNOWN optimizations until
grep -P finds a binary character.
It will bring more than 10x speed up.
$ time -p src/grep -P ^1$ <(seq 999999)
1
real 0.97
user 0.79
sys 0.24
$ time -p src/grep -Pa a <(seq 999999)
real 0.98
user 0.23
sys 0.99
BTW, this change conflicts with proposal in bug#22028.
0001-grep-improve-performance-for-grep-P-in-UTF-8.patch
Description: Text document
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#20526: grep BUG: text file is detected as binary |
Date: |
Fri, 08 Jan 2016 22:46:33 +0900 |
On Wed, 6 Jan 2016 09:57:46 -0800
Paul Eggert <address@hidden> wrote:
> On 01/06/2016 12:32 AM, Paul Eggert wrote:
> > I installed the attached patch, which fixed this performance bug for me.
> Whoops! I forgot to 'git add src/search.h' before committing. We also need
> the attached followup patch, which I installed.
Great! Thanks, many issues including for output of invalid sequence
are fixed by your patches. bug#22103 is also fixed in them, so I am
closing it.
--- End Message ---