[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#31074: Grep -i is slow
From: |
Geoff Kuenning |
Subject: |
bug#31074: Grep -i is slow |
Date: |
Thu, 05 Apr 2018 22:32:41 -0700 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
The -i switch is slow when searching large files. I haven't dug into
the code in detail, although it seems that dfa.c is trying to build an
intelligent case-agnostic DFA when -i is specified. But that doesn't
seem to be working. Perhaps that's because I'm running the UTF-8
character set? Although I don't see why that would affect the DFA.
Here's an example of timing several greps of 151M file named "rawindex",
which has already been read so that it is in the file system buffer cache.
In each case the grep finds a single match, since the matched line is
actually all lowercase; for privacy, I have omitted the match lines
themselves.
A straightforward match takes only 199 ms even with two .* patterns.
Adding -i blows that up to 6917 ms. Finally when I write an explicit
case-agnostic pattern to force how the DFA is built, it does run slower
(532 ms) but it's nowhere near the -i time.
mallet:514> time grep outgoing.*harris.*dcraw rawindex
real 0m0.199s
user 0m0.170s
sys 0m0.029s
mallet:515> time grep -i outgoing.*harris.*dcraw rawindex
real 0m6.917s
user 0m6.879s
sys 0m0.036s
mallet:516> time grep
[Oo][Uu][Tt][Gg][Oo][Ii][Nn][Gg].*[Hh][Aa][Rr][Rr][Ii][Ss].*[Dd][Cc][Rr][Aa][Ww]'
rawindex
real 0m0.532s
user 0m0.491s
sys 0m0.040s
--
Geoff Kuenning address@hidden http://www.cs.hmc.edu/~geoff/
The DMCA criminalizes curiosity. It would put Susie in jail for
taking her stereo apart to see how it works.
- bug#31074: Grep -i is slow,
Geoff Kuenning <=