octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FYI: optimizing certain matrix arithmetic


From: Ben Abbott
Subject: Re: FYI: optimizing certain matrix arithmetic
Date: Tue, 29 Sep 2009 08:05:56 -0400


On Sep 29, 2009, at 7:44 AM, Jaroslav Hajek wrote:

On Tue, Sep 29, 2009 at 1:34 PM, Michael Creel <address@hidden> wrote:
On Tue, Sep 29, 2009 at 12:47 PM, Jaroslav Hajek <address@hidden> wrote:
On Tue, Sep 29, 2009 at 12:26 PM, Michael Creel <address@hidden > wrote:
On Tue, Sep 29, 2009 at 10:44 AM, Jaroslav Hajek <address@hidden> wrote:
On Tue, Sep 29, 2009 at 10:38 AM, Michael Creel <address@hidden > wrote:
Hi all,

On an Apple Macbook Pro running Ubuntu Jaunty amd64, using the benchmark

%%%%%%%%%%%%%%%%%%%%%%
n = 500;
R = triu (rand (n));
u = rand (n, 1);

tic; for i = 1:1000; R \ u; endfor; toc
tic; for i = 1:1000; u' / R; endfor; toc
tic; for i = 1:1000; R' \ u; endfor; toc

R = tril (rand (n));
u = rand (n, 1);

tic; for i = 1:1000; R \ u; endfor; toc
tic; for i = 1:1000; u' / R; endfor; toc
tic; for i = 1:1000; R' \ u; endfor; toc

u = u + I*rand (n, 1);
tic; for i = 1:1000; R \ u; endfor; toc
tic; for i = 1:1000; R' \ u; endfor; toc


n = 800;
a = rand (n);
b = rand (n) + i*rand (n);
tic; a * b; toc
tic; b * a; toc
tic; a' * b; toc
tic; b * a'; toc
tic; a \ b; toc
tic; b / a; toc
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Octave3.0.1 that comes with Ubuntu Jaunty amd64, I get


octave:4> bench
Elapsed time is 0.20216 seconds.
Elapsed time is 1.93894 seconds.
Elapsed time is 2.33824 seconds.
Elapsed time is 0.188448 seconds.
Elapsed time is 1.95657 seconds.
Elapsed time is 2.43552 seconds.
Elapsed time is 4.08299 seconds.
Elapsed time is 7.84752 seconds.
Elapsed time is 0.213021 seconds.
Elapsed time is 0.21117 seconds.
Elapsed time is 0.218387 seconds.
Elapsed time is 0.217174 seconds.
Elapsed time is 0.452714 seconds.
Elapsed time is 0.391383 seconds.
octave:5>



Matlab 2008b gives
bench
Elapsed time is 0.289161 seconds.
Elapsed time is 0.566446 seconds.
Elapsed time is 0.562623 seconds.
Elapsed time is 0.253456 seconds.
Elapsed time is 0.574304 seconds.
Elapsed time is 0.570281 seconds.
Elapsed time is 0.253070 seconds.
Elapsed time is 0.572601 seconds.
Elapsed time is 0.102086 seconds.
Elapsed time is 0.102677 seconds.
Elapsed time is 0.103080 seconds.
Elapsed time is 0.103759 seconds.
Elapsed time is 0.165608 seconds.
Elapsed time is 0.181704 seconds.



Octave 3.2.3+ from today, self compiled, gives
octave:1> bench
Elapsed time is 0.208794 seconds.
Elapsed time is 0.189178 seconds.
Elapsed time is 0.186724 seconds.
Elapsed time is 0.188649 seconds.
Elapsed time is 0.192915 seconds.
Elapsed time is 0.19166 seconds.
Elapsed time is 0.186277 seconds.
Elapsed time is 0.19102 seconds.
Elapsed time is 0.212707 seconds.
Elapsed time is 0.211013 seconds.
Elapsed time is 0.210491 seconds.
Elapsed time is 0.210447 seconds.
Elapsed time is 0.431791 seconds.
Elapsed time is 0.367412 seconds.
octave:2>

Congratulations!
Michael


It's interesting you didn't get any speed-up in the second part of the
benchmark, compared to 3.0.1...
What BLAS and LAPACK are you using? What's your compiler configuration? Also, what exactly is your tip? The "3.2.3+" is a bit unclear, did you
mean "3.3.50+", i.e. the development version?

thanks

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


Oops, sorry, it's 3.3.50+, updated this morning.

I make using
make -j2 CFLAGS="-O3 -march=native -funroll-loops" FFLAGS="-O3
-march=native -funroll-loops" XTRA_CFLAGS="-O3 -march=native
-funroll-loops" XTRA_CXXFLAGS="-O3 -march=native -funroll-loops"


In general, if you're with a newer gcc on a 64-bit architecture, I
advise you against -funroll-loops. For me, it usually gets some +1% of
additional speed of some operations, at the cost of increasing the
binaries' size by more than 50%. Seems like a bad tradeoff.

./configure reports
 BLAS libraries:       -llapack -lcblas -lf77blas -latlas

so I assume that Octave is using Atlas (the atlas dev package that
comes with Kubuntu Jaunty amd64).

Michael


Apparently, yes. Hmm. It's really weird you got almost exactly the same figures.
If you apply the attached patch, rebuild and re-run the benchmark,
what do you get?

--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


With that patch applied, I get
octave:1> bench
Elapsed time is 0.194493 seconds.
Elapsed time is 0.192309 seconds.
Elapsed time is 0.189026 seconds.
Elapsed time is 0.188679 seconds.
Elapsed time is 0.195958 seconds.
Elapsed time is 0.193521 seconds.
Elapsed time is 0.187596 seconds.
Elapsed time is 0.193254 seconds.
Elapsed time is 0.215135 seconds.
Elapsed time is 0.213705 seconds.
Elapsed time is 0.21341 seconds.
Elapsed time is 0.212501 seconds.
Elapsed time is 0.363992 seconds.
Elapsed time is 0.368094 seconds.

so there is an improvement in the second to last number.

Cheers, M.


OK, it's funny. I now understand where the problem is. Just change the line

b = rand (n) + i*rand (n);

to

b = rand (n) + I*rand (n);

(note the big I). At this point, i is still defined from the previous
loops as a real numeric value (!)
And run the benchmarks again. I think this affects Matlab, too.
In any case, it is apparent that your Matlab is linked to something
faster than ATLAS; probably Intel's MKL.

regards

I've read that Matlab does use MKL. Some examples are below.

        http://mail.scipy.org/pipermail/nipy-devel/2009-August/001847.html

        http://www.nabble.com/Re%3A-question-about-the-atlas-and-p22830193.html

Ben





reply via email to

[Prev in Thread] Current Thread [Next in Thread]