help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PP


From: Aielyn
Subject: Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA
Date: Tue, 11 Mar 2014 08:10:08 -0700 (PDT)

I'm made a few more attempts over the last day. I've tried passing "CFLAGS"
and "CXXFLAGS" matching those that were missing from the compiler options
(as per my first post) - no significant effect. I've tried specifically
using "--enable-openmp" - no significant effect. I've tried switching to
OpenBLAS (initial attempt failed to configure, needed a more recent version
of OpenBLAS - that version still failed one of the tests, to do with svds.m)
- no significant effect. I've also tried "-march=native" added to CFLAGS and
CXXFLAGS - no significant effect.

Using profiler, I'm comparing the same code, with same inputs, between the
two versions. For the innermost function (which is called 370 times total),
3.6.3 from binary is spending 21.991 total (16.771 self), with the binary +
operations within it spending 0.974s, binary ./ operations spending 0.778s,
binary .* operations spending 0.732s, and binary - operations spending
0.737s.

For 3.8.0 compiled (most recent version, using OpenBLAS and march=native),
it's spending 34.957 total (27.157 self), with the binary + operations
spending 1.472s, ./ spending 1.227s, .* spending 1.095s, and - spending
1.046s.

As you can see, it's pretty much an across-the-board difference in speed,
increasing total time by somewhere in the vicinity of 50%. Note that the 3.8
times are with jit enabled - disabling it doesn't help. OpenBLAS has had an
effect on some matrix operations as tested using
"A=rand(5000,5000);tic;b=A*A;toc", which runs much faster with OpenBLAS on
3.8 (and appears to be using multiple cores) than with reference BLAS on
3.6.3... but this improvement is not observed for my code.

One final thing to mention about testing - I had a chance to test the same
code in Matlab, on a different computer. While the computer was likely
somewhat faster than my laptop, the code speedup was far beyond that which
one would expect from cpu speed. With 20 "steps" in the code on my laptop,
it takes around 400s. It took Matlab just 140s to take 100 such steps (the
tests described above involve just one step).

I'm thinking that my next attempt will be "--enable-64", although it doesn't
look, in the binary buildlog, like it's using 64 bit indexing. I would
appreciate any help that anybody can provide. Is there reason to think that
the problem might be a performance regression in 3.8, or is this definitely
a matter of compiler settings?



--
View this message in context: 
http://octave.1599824.n4.nabble.com/Compiled-3-8-0-from-source-on-Ubuntu-12-10-slower-than-3-6-3-from-PPA-tp4662951p4663000.html
Sent from the Octave - General mailing list archive at Nabble.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]