octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: bsxfun optimized


From: Jaroslav Hajek
Subject: FYI: bsxfun optimized
Date: Tue, 20 Oct 2009 11:13:14 +0200

hi all,
I optimized bsxfun (binary singleton expansion function applier) to be
faster when some common built-in functions are given. The bsxfun
operations are also available from liboctave as bsxfun_add (NDArray,
NDArray) etc.
http://hg.savannah.gnu.org/hgweb/octave/rev/26abff55f6fe

The following benchmark illustrates the speed-up (set n to suitable value)

n = 200;
i = ones (1, n);
a = rand (n, n, 1);
b = rand (n, n, n);
tic; bsxfun (@plus, a, b); toc
a = rand (n, 1, 1);
tic; bsxfun (@minus, a, b); toc
a = rand (1, n, 1);
tic; bsxfun (@times, a, b); toc

a = rand (n, n, 1);
b = rand (1, n, n);
tic; bsxfun (@rdivide, a, b); toc

b = rand (1, 1, n);
tic; bsxfun (@lt, a, b); toc

a = rand (2, n, n/2, n); # heavily multi-dim
b = rand (1, 1, n/2, n);
tic; bsxfun (@gt, a, b); toc

on a Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native,
with a recent tip, I get:

Elapsed time is 0.697962 seconds.
Elapsed time is 0.645682 seconds.
Elapsed time is 0.709098 seconds.
Elapsed time is 0.720162 seconds.
Elapsed time is 0.608029 seconds.
Elapsed time is 24.2397 seconds.

with the new patch, I got

Elapsed time is 0.0434361 seconds.
Elapsed time is 0.048116 seconds.
Elapsed time is 0.0503559 seconds.
Elapsed time is 0.0644929 seconds.
Elapsed time is 0.0164821 seconds.
Elapsed time is 0.098119 seconds.

an interesting third option is a naive bsxfun replacement, which just
spreads the arrays to a common size and then applies the operator:

function r = bsxfun (op, x, y)
  nd = max (ndims (x), ndims (y));
  xidx = yidx = repmat({':'}, 1, nd);
  for i = 1:nd
    sx = size (x, i);
    sy = size (y, i);
    if (sx == 1)
      xidx{i} = ones (1, sy);
    endif
    if (sy == 1)
      yidx{i} = ones (1, sx);
    endif
  endfor

  r = op (x(xidx{:}), y(yidx{:}));
endfunction

with this replacement, I get:

Elapsed time is 0.0920911 seconds.
Elapsed time is 0.098748 seconds.
Elapsed time is 0.097363 seconds.
Elapsed time is 0.137549 seconds.
Elapsed time is 0.109806 seconds.
Elapsed time is 0.18157 seconds.

of course, the memory consumption will be up to 3x as big, but in
general these are closer to their optimized counterparts than the
original code.

Currently, only some common operations are covered, such as the basic
operations +,-,*,/, relations =,!=,<,>,<=,>= and pairwise min, max are
covered, and the optimizations only work if the operation is
homogeneous. More can be added if there is interest.

One further question that remains is whether to retain the old code,
or replace it with the spread-and-apply approach above.
The old code is more memory-efficient (it creates the result array and
then updates it), but apparently slower, sometimes significantly.
Opinions?

enjoy

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]