[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales
From: |
Jim Meyering |
Subject: |
bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales |
Date: |
Thu, 18 Sep 2014 12:36:57 -0700 |
On Thu, Sep 18, 2014 at 1:33 AM, Santiago Ruano Rincón
<address@hidden> wrote:
> El 17/09/14 a las 23:00, Paul Eggert escribió:
>> I've installed all the patches mentioned so far.
>>
>
> I've successfully build the latest commit
> (f6de00f6cec3831b8f334de7dbd1b59115627457), but I don't see any
> performance boost. Rather the opposite.
>
> Comparing with debian's grep 2.20-3, that includes your first patch to solve
> this -P issue, 0001-grep-P-invalid-utf8-non-matching.patch:
>
> grep -P asdf /usr/bin/* 12,42s user 0,12s system 99% cpu 12,545 total
> src/grep -P asdf /usr/bin/* 14,37s user 0,12s system 99% cpu 14,492 total
>
> Note that basic grep also slowdowns:
>
> grep asdf /usr/bin/* 0,22s user 0,16s system 99% cpu 0,382 total
> src/grep asdf /usr/bin/* 1,26s user 0,12s system 99% cpu 1,384 total
Thank you for running timing comparisons.
Once I verified that I had no large, sparse files in my grep working directory,
I ran the same test there (du -sh . reports 176M, du --app -sh . reports 139M)
The following shows a performance regression when searching files
like those in my grep working directory.
The new grep (v2.20-46-gf6de00f) takes 2.5x longer than 2.20.14.
This is with a hot cache (best of several runs) on a
Intel(R) Xeon(R) CPU E5-2660, compiled with gcc-5.x
$ diff -u <(env time grep -r asdf . 2>&1) <(PATH=src:$PATH env time
grep -r asdf . 2>&1)
--- /proc/self/fd/11 2014-09-18 12:07:43.169721947 -0700
+++ /proc/self/fd/12 2014-09-18 12:07:43.169721947 -0700
@@ -1,3 +1,3 @@
./src/grep.c: printf 'asdfqwerzxcv\rASDF\tZXCV\n'
-0.08user 0.10system 0:00.18elapsed 100%CPU (0avgtext+0avgdata
6256maxresident)k
-0inputs+0outputs (0major+670minor)pagefaults 0swaps
+0.40user 0.11system 0:00.51elapsed 99%CPU (0avgtext+0avgdata 5328maxresident)k
+0inputs+0outputs (0major+634minor)pagefaults 0swaps
It looks like most of the difference is the result of
commit cd36abd46c5e0768606979ea75a51732062f5624,
"grep: treat a file as binary if its prefix contains encoding errors",
with its new,
locale-sensitive "is_binary" test. I saw the above timing difference
even with LC_ALL=C, so one quick fix would be to skip the use of
mbrlen when possible.
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, (continued)
bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/16
bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/18
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Santiago Ruano Rincón, 2014/09/18
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales,
Jim Meyering <=
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Jim Meyering, 2014/09/19
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/25
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Jim Meyering, 2014/09/27
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Paul Eggert, 2014/09/27
- bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Jim Meyering, 2014/09/28
bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales, Zoltán Herczeg, 2014/09/22