octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: reduction funs optimizations + min/max question


From: Jaroslav Hajek
Subject: Re: reduction funs optimizations + min/max question
Date: Mon, 16 Feb 2009 09:23:07 +0100

On Fri, Feb 13, 2009 at 9:22 PM, Jaroslav Hajek <address@hidden> wrote:
> hi,
>
> this changeset: http://hg.savannah.gnu.org/hgweb/octave/rev/53b4fdeacc2e
> reimplements the reduction and cumulative reduction "cores" sum, prod,
> sumsq, cumsum and cumprod
> for better performance.
>


Following similar ideas, I optimized also min/max reductions and
any/all. A simplistic benchmark follows, as usual
(set N to suitable value).

n = 5e3;
a = rand (n);
tic; b = max (a); toc
tic; b = min (a); toc
tic; b = max (a, [], 2); toc
tic; b = min (a, [], 2); toc
tic; [b, i] = max (a); toc
tic; [b, i] = min (a); toc
tic; [b, i] = max (a, [], 2); toc
tic; [b, i] = min (a, [], 2); toc

tiny = a < 1e-2;
huge = ! tiny;
clear a;

tic; any (tiny); toc
tic; any (tiny, 2); toc
tic; all (huge); toc
tic; all (huge, 2); toc

with a recent tip, I get:

Elapsed time is 0.195431 seconds.
Elapsed time is 0.19919 seconds.
Elapsed time is 0.335377 seconds.
Elapsed time is 0.353903 seconds.
Elapsed time is 0.194535 seconds.
Elapsed time is 0.199214 seconds.
Elapsed time is 0.333951 seconds.
Elapsed time is 0.353886 seconds.
Elapsed time is 0.242775 seconds.
Elapsed time is 0.244407 seconds.
Elapsed time is 0.149176 seconds.
Elapsed time is 0.150418 seconds.

with the new patches, I get:

Elapsed time is 0.0466709 seconds.
Elapsed time is 0.0463569 seconds.
Elapsed time is 0.0438602 seconds.
Elapsed time is 0.0425069 seconds.
Elapsed time is 0.0478208 seconds.
Elapsed time is 0.0480652 seconds.
Elapsed time is 0.03636 seconds.
Elapsed time is 0.0462041 seconds.
Elapsed time is 0.0008111 seconds.
Elapsed time is 0.0244672 seconds.
Elapsed time is 0.000770807 seconds.
Elapsed time is 0.0245111 seconds.

and the relative speed-ups (the usual definition):

  319%
  330%
  665%
  733%
  307%
  314%
  818%
  666%
29832%
  899%
19253%
  514%

One more note: as you can notice, the column-oriented any/all are much
faster than row-oriented.
That's because the column-oriented versions use short-circuiting while
the row-oriented do not.
In the row-reduction any/all case, there is a trade-off between
working by columns in a cache-coherent manner
(that's what the current version does) and sacrificing
short-circuiting or working by rows to get short-circuiting and
sacrifice cache-coherency.

comments?

cheers

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]