octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: sum improved


From: Jaroslav Hajek
Subject: FYI: sum improved
Date: Tue, 13 Oct 2009 12:38:33 +0200

the built-in "sum" function was improved:
http://hg.savannah.gnu.org/hgweb/octave/rev/192d94cff6c1

"sum" now accepts an 'extra' option that makes it use a special
summation algorithm, compensating for loss of precision (slightly more
sophisticated than the well-known Kahan's). The effect is demonstrable
by the following script:

ans =  0.015097
Elapsed time is 0.000108957 seconds.
ans = 0
Elapsed time is 0.0004 seconds.
ans =  10.015
Elapsed time is 2.69e-05 seconds.
ans =  10
Elapsed time is 3.41e-05 seconds.
ans = -7.1538
Elapsed time is 0.000391 seconds.
ans =  1.1546e-14
Elapsed time is 0.0005829 seconds.
ans =  2.8462
Elapsed time is 0.0002379 seconds.
ans =  10.000
Elapsed time is 0.000573 seconds.
ans =  24.671
Elapsed time is 0.0269 seconds.
ans = -1.6911e-12
Elapsed time is 0.06166 seconds.
ans =  34.671
Elapsed time is 0.0352 seconds.
ans = 10.0000
Elapsed time is 0.06382 seconds.

as you can see, the compensated summation always gives a much more
precise result (should be 0 or 10), at the cost of speed penalty
factor up to some 2+ (this depends on memory management). This is with
SSE arithmetics on 64-bit machines; I'm wondering what 387 arithmetics
would do. The "extra" option is not yet working for sparse matrices.

the double-accumulated non-native reduction (default for integers,
specified by "double") is now performed inline, without temporary
conversion to double if possible. This brings it on par with native
reductions, as demonstrated by the following benchmark (set n to
suitable value):

n = 2500;
disp ("single");
a = single (rand (n));
tic; sum (a, 1, "native"); toc;
tic; sum (a, 1, "double"); toc;
tic; sum (a, 2, "native"); toc
tic; sum (a, 2, "double"); toc
tic; sum (a(:), "native"); toc
tic; sum (a(:), "double"); toc
disp ("single complex");
a = complex (single (rand (n)), single (rand (n)));
tic; sum (a, 1, "native"); toc;
tic; sum (a, 1, "double"); toc;
tic; sum (a, 2, "native"); toc
tic; sum (a, 2, "double"); toc
tic; sum (a(:), "native"); toc
tic; sum (a(:), "double"); toc
disp ("int32");
a = int32 (1e4 * (rand (n) - 0.5));
tic; sum (a, 1, "native"); toc;
tic; sum (a, 1, "double"); toc;
tic; sum (a, 2, "native"); toc
tic; sum (a, 2, "double"); toc
tic; sum (a(:), "native"); toc
tic; sum (a(:), "double"); toc
disp ("uint16");
a = uint16 (1e1 * rand (n));
tic; sum (a, 1, "native"); toc;
tic; sum (a, 1, "double"); toc;
tic; sum (a, 2, "native"); toc
tic; sum (a, 2, "double"); toc
tic; sum (a(:), "native"); toc
tic; sum (a(:), "double"); toc
disp ("int64");
a = int64 (1e8 * (rand (n) - 0.5));
tic; sum (a, 1, "native"); toc;
tic; sum (a, 1, "double"); toc;
tic; sum (a, 2, "native"); toc
tic; sum (a, 2, "double"); toc
tic; sum (a(:), "native"); toc
tic; sum (a(:), "double"); toc

with a recent tip, I get (Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native)

single
Elapsed time is 0.00736904 seconds.
Elapsed time is 0.0398569 seconds.
Elapsed time is 0.00691319 seconds.
Elapsed time is 0.0457029 seconds.
Elapsed time is 0.00729012 seconds.
Elapsed time is 0.0429699 seconds.
single complex
Elapsed time is 0.009444 seconds.
Elapsed time is 0.122845 seconds.
Elapsed time is 0.0125308 seconds.
Elapsed time is 0.124614 seconds.
Elapsed time is 0.00960779 seconds.
Elapsed time is 0.125515 seconds.
int32
Elapsed time is 0.00868487 seconds.
Elapsed time is 0.04076 seconds.
Elapsed time is 0.00867987 seconds.
Elapsed time is 0.0458021 seconds.
Elapsed time is 0.00956798 seconds.
Elapsed time is 0.0463171 seconds.
uint16
Elapsed time is 0.00492692 seconds.
Elapsed time is 0.039608 seconds.
Elapsed time is 0.00606394 seconds.
Elapsed time is 0.0457959 seconds.
Elapsed time is 0.00865698 seconds.
Elapsed time is 0.0426869 seconds.
int64
Elapsed time is 0.010695 seconds.
Elapsed time is 0.0449679 seconds.
Elapsed time is 0.0123532 seconds.
Elapsed time is 0.0477431 seconds.
Elapsed time is 0.0102439 seconds.
Elapsed time is 0.04808 seconds.

with the new patch, I get:

single
Elapsed time is 0.00735211 seconds.
Elapsed time is 0.00744104 seconds.
Elapsed time is 0.0052731 seconds.
Elapsed time is 0.0054481 seconds.
Elapsed time is 0.0071578 seconds.
Elapsed time is 0.00739503 seconds.
single complex
Elapsed time is 0.0096209 seconds.
Elapsed time is 0.0110161 seconds.
Elapsed time is 0.0124722 seconds.
Elapsed time is 0.0198781 seconds.
Elapsed time is 0.010736 seconds.
Elapsed time is 0.0117328 seconds.
int32
Elapsed time is 0.00881386 seconds.
Elapsed time is 0.00739193 seconds.
Elapsed time is 0.00970101 seconds.
Elapsed time is 0.0179269 seconds.
Elapsed time is 0.00954795 seconds.
Elapsed time is 0.00770593 seconds.
uint16
Elapsed time is 0.00493503 seconds.
Elapsed time is 0.00696611 seconds.
Elapsed time is 0.00503612 seconds.
Elapsed time is 0.013629 seconds.
Elapsed time is 0.00915194 seconds.
Elapsed time is 0.00722909 seconds.
int64
Elapsed time is 0.010839 seconds.
Elapsed time is 0.00875592 seconds.
Elapsed time is 0.0108941 seconds.
Elapsed time is 0.015064 seconds.
Elapsed time is 0.0130761 seconds.
Elapsed time is 0.010613 seconds.

as you can see, some operations' speed improved by a factor of 5 or more.

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]