[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#29124: [PATCH] grep: simplify out_file handling
From: |
Zev Weiss |
Subject: |
bug#29124: [PATCH] grep: simplify out_file handling |
Date: |
Fri, 3 Nov 2017 10:44:31 -0500 |
User-agent: |
NeoMutt/20171013 |
On Fri, Nov 03, 2017 at 03:46:59AM CDT, Paul Eggert wrote:
This adds a syscall, which sounds like both a performance loss
This I considered while writing the patch, but decided was unlikely to
be significant. The isdir() is only evaluated in the case of receiving
exactly one command-line argument and recursion being enabled (e.g.
'-r'). So I think the case where the overhead of the syscall would be
highest relative to the total execution time would be a recursive grep
of a single empty file.
Benchmarking that (src/grep-patched includes the simplification patch;
src/grep does not):
address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null
Performance counter stats for './src/grep -r /dev/null' (1000 runs):
0 dummy:u
0.001551488 seconds time elapsed
( +- 0.15% )
address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r ''
/dev/null
Performance counter stats for './src/grep-patched -r /dev/null' (1000 runs):
0 dummy:u
0.001554579 seconds time elapsed
( +- 0.15% )
address@hidden: grep]% echo 5 k 0.001554579 0.001551488 / p | dc
1.00199
So a 3-microsecond difference (<0.2%, which will of course rapidly
approach 0% in the case of any actual work to do). In fact, it doesn't
even seem to be reliably above the threshold of measurement noise --
here's another pair of runs (bearing in mind these are averages of a
thousand iterations each):
address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r ''
/dev/null
Performance counter stats for './src/grep-patched -r /dev/null' (1000 runs):
0 dummy:u
0.001548791 seconds time elapsed
( +- 0.15% )
address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null
Performance counter stats for './src/grep -r /dev/null' (1000 runs):
0 dummy:u
0.001549041 seconds time elapsed
( +- 0.17% )
address@hidden: grep]% echo 5 k 0.001548791 0.001549041 / p | dc
.99983
One could perhaps also argue that asymptotically, with a *large* amount
of work to do, the time savings of not updating out_file would
eventually outweigh the cost of the added stat call.
and a correctness issue, as it introduces a race if the file system is
being modified while we grep.
This I did not consider. However, I'm wondering how robust (paranoid?)
grep really needs to be in the face of concurrent filesystem
modifications. Say grep is searching for the pattern AB (where A and B
are each sufficiently lengthy substrings) in a file that contains it,
has successfully matched A, and is in the process of matching B. If
before it finishes doing so another process modifies the file and pokes
a byte in A making it no longer match, should grep be expected to detect
that and not print the match? (I would certainly think not.)
Similarly, a user, script, etc. expecting any sort of reliable,
well-defined behavior from grep while *changing the type of a path grep
is currently operating on* would be...sort of unreasonable I'd think?
Zev