grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Grep-devel] 3x buffer size yields 3-23% performance improvement


From: Jim Meyering
Subject: [Grep-devel] 3x buffer size yields 3-23% performance improvement
Date: Sat, 13 Oct 2018 21:21:21 -0700

FYI, I'm about to push this:

>From 41ac7a99647316b5ea77d70199282fa9bbd731d4 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 6 Sep 2018 11:27:01 -0700
Subject: [PATCH] grep: triple initial buffer size: 32k->96k

Changing 32k to 96k gives a 3-23% performance improvement.
All timings ran with this diff on top of commit v3.1-39-g7179b21:

for n in 32 64 96 128; do
  echo n=$n
  perl -pi -e 's/(INITIAL_BUFSIZE =) \d+/$1 '$n/ src/grep.c &&
    make AM_CFLAGS=-O3 WERROR_CFLAGS= >& makerr-$n &&
    for needle in 1f2 1f298lkjskjhahjklkj34; do
      echo "  needle=$needle"
      for i in $(seq 10); do
         env MALLOC_PERTURB_= time -qf%e src/grep $needle w2000
      done 2>&1 |sort -g | tee >(head -1|sed 's/^/    /') > .time-${n}KB-$needle
    done
done

Tested searchs: search for a short literal pattern that is not
present in 9.3GB file containing 2000 copies of /usr/dict/words
created via this:
  ln -s /usr/share/dict/words k && cat $(yes k|head -2000) > w2000
I ran this command:
  env MALLOC_PERTURB_= time src/grep 1f2 w2000
old(32k) vs new elapsed time, best of 10 trials (gcc-9.0.0 20180831, -O3):
 32k  64k  96k(%incr) 128k CPU
1.25 1.18 1.16( 7.2) 1.20 address@hidden cache=8MB
1.21 1.16 1.17( 3.3) 1.19 Xeon(R) E3-1505M v5 @ 2.80GHz cache=8MB
2.36 2.29 2.29( 3.0) 2.36 Xeon(R) E5-2680 v4 @ 2.40GHz cache=32MB
1.40 1.32 1.31( 6.4) 1.33 i5-6260U @ 1.80GHz cache=4MB
1.31 1.26 1.24( 5.3) 1.23 AMD FX(tm)-4100 cache=2MB (with only 1000 copies)

Searching for a longer string: 1f298lkjskjhahjklkj34
2.03 1.76 1.61(20.7) 1.53 address@hidden cache=8MB
1.95 1.70 1.56(20.0) 1.51 Xeon(R) E3-1505M v5 @ 2.80GHz
3.27 2.98 2.84(13.1) 3.02 Xeon(R) E5-2680 v4 @ 2.40GHz
2.48 2.12 1.91(23.0) 1.80 i5-6260U @ 1.80GHz cache=4MB
1.72 1.54 1.46(15.1) 1.41 AMD FX(tm)-4100 cache=2MB

* src/grep.c (INITIAL_BUFSIZE): Triple it: 32kB -> 96kB
---
 src/grep.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/grep.c b/src/grep.c
index fec9a53..aa1d6dd 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -799,7 +799,6 @@ skipped_file (char const *name, bool command_line, bool 
is_dir)

 static char *buffer;           /* Base of buffer. */
 static size_t bufalloc;                /* Allocated buffer size, counting 
slop. */
-enum { INITIAL_BUFSIZE = 32768 }; /* Initial buffer size, not counting slop. */
 static int bufdesc;            /* File descriptor. */
 static char *bufbeg;           /* Beginning of user-visible stuff. */
 static char *buflim;           /* Limit of user-visible stuff. */
@@ -812,6 +811,9 @@ static bool skip_nuls;              /* Skip '\0' in data.  
*/
 static bool skip_empty_lines;  /* Skip empty lines in data.  */
 static uintmax_t totalnl;      /* Total newline count before lastnl. */

+/* Initial buffer size, not counting slop. */
+enum { INITIAL_BUFSIZE = 96 * 1024 };
+
 /* Return VAL aligned to the next multiple of ALIGNMENT.  VAL can be
    an integer or a pointer.  Both args must be free of side effects.  */
 #define ALIGN_TO(val, alignment) \
--
2.18.0



reply via email to

[Prev in Thread] Current Thread [Next in Thread]