Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PP

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PP

From:	Aielyn
Subject:	Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA
Date:	Tue, 11 Mar 2014 08:10:08 -0700 (PDT)

I'm made a few more attempts over the last day. I've tried passing "CFLAGS"
and "CXXFLAGS" matching those that were missing from the compiler options
(as per my first post) - no significant effect. I've tried specifically
using "--enable-openmp" - no significant effect. I've tried switching to
OpenBLAS (initial attempt failed to configure, needed a more recent version
of OpenBLAS - that version still failed one of the tests, to do with svds.m)
- no significant effect. I've also tried "-march=native" added to CFLAGS and
CXXFLAGS - no significant effect.

Using profiler, I'm comparing the same code, with same inputs, between the
two versions. For the innermost function (which is called 370 times total),
3.6.3 from binary is spending 21.991 total (16.771 self), with the binary +
operations within it spending 0.974s, binary ./ operations spending 0.778s,
binary .* operations spending 0.732s, and binary - operations spending
0.737s.

For 3.8.0 compiled (most recent version, using OpenBLAS and march=native),
it's spending 34.957 total (27.157 self), with the binary + operations
spending 1.472s, ./ spending 1.227s, .* spending 1.095s, and - spending
1.046s.

As you can see, it's pretty much an across-the-board difference in speed,
increasing total time by somewhere in the vicinity of 50%. Note that the 3.8
times are with jit enabled - disabling it doesn't help. OpenBLAS has had an
effect on some matrix operations as tested using
"A=rand(5000,5000);tic;b=A*A;toc", which runs much faster with OpenBLAS on
3.8 (and appears to be using multiple cores) than with reference BLAS on
3.6.3... but this improvement is not observed for my code.

One final thing to mention about testing - I had a chance to test the same
code in Matlab, on a different computer. While the computer was likely
somewhat faster than my laptop, the code speedup was far beyond that which
one would expect from cpu speed. With 20 "steps" in the code on my laptop,
it takes around 400s. It took Matlab just 140s to take 100 such steps (the
tests described above involve just one step).

I'm thinking that my next attempt will be "--enable-64", although it doesn't
look, in the binary buildlog, like it's using 64 bit indexing. I would
appreciate any help that anybody can provide. Is there reason to think that
the problem might be a performance regression in 3.8, or is this definitely
a matter of compiler settings?

--
View this message in context:
http://octave.1599824.n4.nabble.com/Compiled-3-8-0-from-source-on-Ubuntu-12-10-slower-than-3-6-3-from-PPA-tp4662951p4663000.html
Sent from the Octave - General mailing list archive at Nabble.com.

[Prev in Thread]

Current Thread

[Next in Thread]

Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA, Aielyn, 2014/03/10
- Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA, Aielyn <=
  - Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA, Mike Miller, 2014/03/11
    - Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA, Aielyn, 2014/03/11
    - Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA, Alexander Barth, 2014/03/12

Prev by Date: printing to png at different size than default?
Next by Date: Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA
Previous by thread: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA
Next by thread: Re: Compiled 3.8.0 from source on Ubuntu 12.10 slower than 3.6.3 from PPA
Index(es):
- Date
- Thread