--- Begin Message ---
Subject: |
[PATCH] dfa: speed-up for long pattern |
Date: |
Tue, 25 Nov 2014 10:15:39 +0900 |
DFA trys to find a long sequence of characters that must appear in any
line containing the r.e. in dfamust() However, if a pattern is long, it
is very slow, as it processes all characters step by step. This change
makes a string concatenated some normal characters process at a time.
Following test case is posted in bug#15191. It speeds-up more than 60x.
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15191#5
<< before changed >>
$ time -p grep -f regex.re input_lines.txt
real 1.28
user 1.27
sys 0.00
<< after changed >>
$ time -p grep -f regex.re input_lines.txt
real 0.02
user 0.01
sys 0.00
0001-dfa-speed-up-for-long-pattern.patch
Description: Binary data
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#19173: [PATCH] dfa: speed-up for long pattern |
Date: |
Sat, 18 Jul 2015 11:44:28 -0700 |
On Sat, Jul 4, 2015 at 8:40 PM, Jim Meyering <address@hidden> wrote:
> On Mon, Nov 24, 2014 at 5:15 PM, Norihiro Tanaka <address@hidden> wrote:
...
> Thank you for that patch.
> I have rebased it and made some small improvements:
> I combined an if+do loop into a single for-loop and moved
> some declarations "down". I constructed a reproducer that
> does not require two large inputs to demonstrate the
> performance improvement.
>
> I've also begun to reword the commit log, but am out of time
> for this evening, so will post this here, for now.
>
> I am still trying to convince myself that this is a strict
> improvement, i.e., that the O(N^2) strstr calls avoided
> by this change served no purpose.
I have pushed that.
I have also pushed a test (attached) for this performance fix in
a separate commit. As with most performance-measuring tests,
it may be fragile, especially when many tests are run in parallel,
but it passes for me on a few different systems.
0001-tests-add-a-test-for-the-performance-fix.patch
Description: Text Data
--- End Message ---