[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Vectorizing strcmp
From: |
Keith Goodman |
Subject: |
Re: Vectorizing strcmp |
Date: |
Wed, 15 Sep 2004 11:16:22 -0700 |
Good point. How about this:
The for loop is part of the problem. Another part is that the
recursive call to strcmp needs three or four if statements to
determine that we are asking for a strcmp between two strings. But we
already know that they are strings since we are creating the strings
from the cell.
So we could replace
for i = 1:n
retval(i,:) = strcmp (t1{i}, s2);
endfor
with
for i = 1:n
retval(i,:) = isequal (t1 {i}, s2);
endfor
for a small but easy speed up.
On Wed, 15 Sep 2004 13:18:13 -0400, John W. Eaton <address@hidden> wrote:
> On 15-Sep-2004, Keith Goodman <address@hidden> wrote:
>
> | On the one hand, it's nice that strcmp is a user-defined
> | function---it's easy to look at the code to see what it does.
>
> Perhaps it is easier because it is a user-defined function, but with
> Octave, unlike the other leading brand, you can always look at all of
> the source code for any of the functions.
>
> | On the other hand, the reason I'm looking at the code is that it is slow.
> |
> | I'm using strcmp(s1,s2) where s1 is a cell array of strings (around
> | 1000X1) and s2 is a string.
> |
> | Currently, in 2.1.58, strcmp does
> |
> | n = length(t1);
> | for i = 1:n
> | retval(i,:) = strcmp (t1{i}, s2);
> | endfor
> |
> | which recursively calls strcmp for two strings and where t1 is s1(:).
> | The for loop is slow.
> |
> | Short of making strcmp a built-in function, one way to speed it up is
> | to replace the for loop with a repmat like this
> |
> | c1 = char(t1);
> | [rc1, cc1] = size(c1);
> | ns2 = size(s2,2);
> | blank = ' ';
> | ms2 = repmat([s2 repmat(blank,1,max(cc1 - ns2, 0))], rc1, 1);
> | mc1 = [c1 repmat(blank,1,max(ns2 - cc1, 0))];
>
> | retval2 = sum(mc1==ms2, 2)==size(mc1, 2);
>
> Wouldn't
>
> all (mc1 == ms2, 2)
>
> be better here?
>
> | The second code fragment is about 10 times faster than the first for
> | length(s1)==1000. For length(s1)==10 the speed is about the same. And
> | for length(s1) < 10 the first approach is faster.
> |
> | For me the added complexity in the code is more than offset by the
> | gain in speed.
>
> I think we need to be careful about using repmat in this way. If your
> cell array is large and the comparison string is fairly long, it could
> easily generate a matrix many megabytes in size. Following that with
> code that generates additional temporary arrays could cause trouble.
> Maybe it could be done in blocks rather than all at once?
>
> Or, perhaps it should be a builtin function.
>
> jwe
>
>