octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: reusing temporaries in nested array expressions


From: Jaroslav Hajek
Subject: FYI: reusing temporaries in nested array expressions
Date: Thu, 3 Sep 2009 10:43:07 +0200

hi all,

using the machinery for in-place evaluation of computed assignment
operators (+=, *= etc), I implemented another possibly useful
optimization:
When evaluating nested expressions, Octave will now make some attempts
to reuse temporary arrays instead of allocating new one for each
result.
For instance, the expression
2*a + b
where a and b are large arrays, will now be done like this:

allocate c
forall i: c(i) = 2*a(i)
forall i: c(i) = c(i) + b(i)
c is the result

previously, the flow was slightly different:

allocate c
forall i: c(i) = 2*a(i)
allocate d
forall i: d(i) = c(i) + b(i)
deallocate c
d is the result

i.e., the memory requirements are cut down by the size of c. The
interpreter is not extra smart about this, it only reuses temporaries
from left to right, so that
(a+b)*2 will reuse a+b, but 2*(a+b) won't. The latter would need a bit
extensive checking whether the operand order can be changed, and I
don't think it's worth the trouble.
Also, the unary operators - and ! applied on a temporary will work
in-place; for instance
mask = ! isnan (x);
will actually work like this
mask = isnan (x);
for i=1:n; mask(i) = ! mask (i); end
but of course the loop is internal and compiled.

Besides the memory savings, it appears that the in-place loops are
slightly faster in some cases.
Consider the following benchmark:

n = 5000;
a = rand (n); b = rand (n);

disp ("sequential out-of-place evaluation");
tic;
c = a + 1;
c = c .* b;
c = c - 1;
toc

disp ("sequential in-place evaluation");
tic;
c = a;
c += 1;
c .*= b;
c -= 1;
toc

disp ("natural form evaluation");
tic;
c = (a + 1) .* b - 1;
toc

Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native

Using octave 3.2.3 RC2, I get:

sequential out-of-place evaluation
Elapsed time is 0.432471 seconds.
sequential in-place evaluation
Elapsed time is 0.441618 seconds.
natural form evaluation
Elapsed time is 0.453721 seconds.

With a recent tip, I get:

sequential out-of-place evaluation
Elapsed time is 0.434277 seconds.
sequential in-place evaluation
Elapsed time is 0.331977 seconds.
natural form evaluation
Elapsed time is 0.445641 seconds.

and with the new patch, I get:

sequential out-of-place evaluation
Elapsed time is 0.433064 seconds.
sequential in-place evaluation
Elapsed time is 0.332374 seconds.
natural form evaluation
Elapsed time is 0.331975 seconds.

it seems the speed-up can go up to some 25%. Technically, I'm not sure
why that is happening, as the number of operations should be the same,
maybe it's cache effects, or easier optimization. In any case, I think
it's nice to have Octave do the evaluation a little smarter (yeah it's
the defend-your-own-ideas attitude).

enjoy

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]