octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: optimized strfind


From: Jaroslav Hajek
Subject: FYI: optimized strfind
Date: Tue, 29 Dec 2009 06:42:54 +0100

hi all,

by the following changesets
http://hg.savannah.gnu.org/hgweb/octave/rev/bf26f81d009f
http://hg.savannah.gnu.org/hgweb/octave/rev/a14dc255427f

I implemented a compiled version of the "strfind" function.
The algorithm used is QuickSearch:
http://www-igm.univ-mlv.fr/~lecroq/string/node19.html

The performance improvements can be demonstrated using this benchmark:

for n = [1e5 1e6 1e7]
  a = char ("A" + ("z"-"A")*rand (1, n));
  for m = [1e1 1e2 1e3 1e4];
    b = char ("A" + ("z"-"A")*rand (1, m));
    for i = 1:2
      j = ceil (n*rand);
      a(j:j+m-1) = b;
    endfor
    tic;
    idx = strfind (a, b);
    printf ("n = %d, m = %d, time = %f\n", n, m, toc);
  endfor
endfor

[names, helps] = lookfor ("");
helps = repmat (helps, 100, 1);
tic; strfind (helps, "matrix"); toc
tic; strfind (helps, "0 otherwise"); toc


with the old implementation, I get:

n = 100000, m = 10, time = 0.001085
n = 100000, m = 100, time = 0.003949
n = 100000, m = 1000, time = 0.033411
n = 100000, m = 10000, time = 0.330224
n = 1000000, m = 10, time = 0.005759
n = 1000000, m = 100, time = 0.009250
n = 1000000, m = 1000, time = 0.038011
n = 1000000, m = 10000, time = 0.334651
n = 10000000, m = 10, time = 0.054245
n = 10000000, m = 100, time = 0.055490
n = 10000000, m = 1000, time = 0.085641
n = 10000000, m = 10000, time = 0.379713
Elapsed time is 20.8236 seconds.
Elapsed time is 14.4698 seconds.

and with the new code

n = 100000, m = 10, time = 0.000439
n = 100000, m = 100, time = 0.000095
n = 100000, m = 1000, time = 0.000100
n = 100000, m = 10000, time = 0.000132
n = 1000000, m = 10, time = 0.003628
n = 1000000, m = 100, time = 0.000794
n = 1000000, m = 1000, time = 0.000840
n = 1000000, m = 10000, time = 0.000863
n = 10000000, m = 10, time = 0.035978
n = 10000000, m = 100, time = 0.008793
n = 10000000, m = 1000, time = 0.008274
n = 10000000, m = 10000, time = 0.008251
Elapsed time is 0.136669 seconds.
Elapsed time is 0.138137 seconds.


this shows that the m-code was competitive for very small patterns
(for 1-3 characters, it was even slightly better), but can be as much
as 2000x slower for long patterns. Note that with the new
implementation, the search is not slower with a longer pattern, in
fact it may get somewhat quicker - in theory, the best case is O(N/M).

Also, searching the same pattern in an array of strings is now much
faster, as shown by the last two timings.
This may even positively impact functions like lookfor.

enjoy

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]