bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29124: [PATCH] grep: simplify out_file handling


From: Zev Weiss
Subject: bug#29124: [PATCH] grep: simplify out_file handling
Date: Fri, 3 Nov 2017 10:44:31 -0500
User-agent: NeoMutt/20171013

On Fri, Nov 03, 2017 at 03:46:59AM CDT, Paul Eggert wrote:
This adds a syscall, which sounds like both a performance loss

This I considered while writing the patch, but decided was unlikely to be significant. The isdir() is only evaluated in the case of receiving exactly one command-line argument and recursion being enabled (e.g. '-r'). So I think the case where the overhead of the syscall would be highest relative to the total execution time would be a recursive grep of a single empty file.

Benchmarking that (src/grep-patched includes the simplification patch;
src/grep does not):

address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null

Performance counter stats for './src/grep -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001551488 seconds time elapsed                                          
( +-  0.15% )

address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r '' 
/dev/null

Performance counter stats for './src/grep-patched -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001554579 seconds time elapsed                                          
( +-  0.15% )

address@hidden: grep]% echo 5 k 0.001554579 0.001551488 / p | dc
1.00199

So a 3-microsecond difference (<0.2%, which will of course rapidly approach 0% in the case of any actual work to do). In fact, it doesn't even seem to be reliably above the threshold of measurement noise -- here's another pair of runs (bearing in mind these are averages of a thousand iterations each):

address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r '' 
/dev/null

Performance counter stats for './src/grep-patched -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001548791 seconds time elapsed                                          
( +-  0.15% )

address@hidden: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null

Performance counter stats for './src/grep -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001549041 seconds time elapsed                                          
( +-  0.17% )

address@hidden: grep]% echo 5 k 0.001548791 0.001549041 / p | dc
.99983

One could perhaps also argue that asymptotically, with a *large* amount of work to do, the time savings of not updating out_file would eventually outweigh the cost of the added stat call.

and a correctness issue, as it introduces a race if the file system is being modified while we grep.

This I did not consider. However, I'm wondering how robust (paranoid?) grep really needs to be in the face of concurrent filesystem modifications. Say grep is searching for the pattern AB (where A and B are each sufficiently lengthy substrings) in a file that contains it, has successfully matched A, and is in the process of matching B. If before it finishes doing so another process modifies the file and pokes a byte in A making it no longer match, should grep be expected to detect that and not print the match? (I would certainly think not.) Similarly, a user, script, etc. expecting any sort of reliable, well-defined behavior from grep while *changing the type of a path grep is currently operating on* would be...sort of unreasonable I'd think?


Zev






reply via email to

[Prev in Thread] Current Thread [Next in Thread]