[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep -f scales extremely poorly with number of lines in pattern file
From: |
Narfi Stefansson |
Subject: |
grep -f scales extremely poorly with number of lines in pattern file |
Date: |
Sun, 19 Mar 2006 23:28:48 -0500 |
User-agent: |
KMail/1.8.2 |
Hi all,
I was expecting it to take order n time to match against n patterns, where n
is the number of lines in the pattern file.
What I am experiencing: The execution time of grep -f behaves like n^a, with
a somewhere in the range [2.5, 3.0].
The execution time with a 100 line pattern file was 0.17 sec, but the
execution time with a 1000 line pattern file was 59.7 seconds, nowhere near
the 10 * 0.17 sec = 1.7 seconds that I expected.
Details and reproduction steps are given below.
Thanks,
Narfi
Here is what I did, I created 1000 strings and 1000 regexps:
cat /dev/urandom | od -x | head -1000 > regexps.txt
cat /dev/urandom | od -x | head -1000 > text.txt
Avoid any problems with the locale:
export LC_ALL=C
export LANG=C
export LANGUAGE=C
I then timed the execution of matching against the first 100, 200, etc
patterns in regexps.txt:
for n in 100 200 300 400 500 600 700 800 900 1000; do
head -$n regexps.txt > current_regexps.txt
/usr/bin/time grep -f current_regexps.txt text.txt
done
Following is the user time in seconds as reported by /usr/bin/time, for n in
100, 200, ..., 1000:
0.17 0.80 2.08 4.10 7.02 11.33 17.31 28.68 41.39 59.70
This is on Mandriva 2006.
% grep --version
grep (GNU grep) 2.5.1
- grep -f scales extremely poorly with number of lines in pattern file,
Narfi Stefansson <=