bug#22239: fgrep -i slow in 2.21

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22239: fgrep -i slow in 2.21

From:	Paul Eggert
Subject:	bug#22239: fgrep -i slow in 2.21
Date:	Tue, 17 Jan 2017 08:18:30 -0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

On 04/11/2016 12:14 AM, Ondřej Cífka wrote:
> You're probably right about the locale. I'm using cs_CZ.UTF-8. With
> LC_ALL=C, both variants run faster and the difference is
> insignificant.
>
> With cs_CZ.UTF-8, on my machine, your test case takes 2.322s with -i
> and 0.464s without -i.
>
> I tested on my Aspell dictionary dump, where the difference is more 
> noticeable:
>
> aspell dump master | head -n 100000 >list.txt
>
> grep 2.21 with -i: 7.336s
> grep 2.21 without -i: 0.312s
> grep 2.16 with -i: 0.372s
> grep 2.16 without -i: 0.431s
>
> With LC_ALL=C, both versions are about as fast.

I got some free time to look into this, and installed the attached set
of patches; the 2nd one is the key one. In the en_EN.utf8 locale on my
platform (Fedora 25 x86-64), I get the following user times for 'grep
-Ff list.txt list.txt' where list.txt was generated as you describe:

   0.444 grep 2.16
   0.522 grep 2.16 -i
   0.443 grep 2.21
  13.048 grep 2.21 -i
   0.096 grep current
   0.101 grep current -i

Since this patch causes grep to use Aho-Corasick more often, I expect it
to hurt performance in some cases involving multiple patterns, but we
can look into that as they turn up. In the meantime since the original
bug seems to be fixed I am taking the liberty of closing the bug report.

0001-build-update-gnulib-submodule-to-latest.txt
Description: Text document

0002-Improve-i-performance-in-typical-UTF-8-searches.txt
Description: Text document

0003-src-kwset.c-Fix-comment-typo.txt
Description: Text document

0004-NEWS-Fix-typo.txt
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

bug#22239: fgrep -i slow in 2.21, Paul Eggert <=

Prev by Date: bug#24689: Fwd: bug#22239: New Project
Next by Date: bug#22793: grep -E assertion failure with back references
Previous by thread: bug#22239: New Project
Next by thread: bug#22793: grep -E assertion failure with back references
Index(es):
- Date
- Thread