octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FYI: optimizing certain matrix arithmetic


From: Jaroslav Hajek
Subject: Re: FYI: optimizing certain matrix arithmetic
Date: Tue, 29 Sep 2009 14:15:38 +0200

On Tue, Sep 29, 2009 at 2:08 PM, Jaroslav Hajek <address@hidden> wrote:
> On Tue, Sep 29, 2009 at 1:54 PM, Michael Creel <address@hidden> wrote:
>> On Tue, Sep 29, 2009 at 1:44 PM, Jaroslav Hajek <address@hidden> wrote:
>>> On Tue, Sep 29, 2009 at 1:34 PM, Michael Creel <address@hidden> wrote:
>>>> On Tue, Sep 29, 2009 at 12:47 PM, Jaroslav Hajek <address@hidden> wrote:
>>>>> On Tue, Sep 29, 2009 at 12:26 PM, Michael Creel <address@hidden> wrote:
>>>>>> On Tue, Sep 29, 2009 at 10:44 AM, Jaroslav Hajek <address@hidden> wrote:
>>>>>>> On Tue, Sep 29, 2009 at 10:38 AM, Michael Creel <address@hidden> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> On an Apple Macbook Pro running Ubuntu Jaunty amd64, using the 
>>>>>>>> benchmark
>>>>>>>>
>>>>>>>> %%%%%%%%%%%%%%%%%%%%%%
>>>>>>>> n = 500;
>>>>>>>> R = triu (rand (n));
>>>>>>>> u = rand (n, 1);
>>>>>>>>
>>>>>>>> tic; for i = 1:1000; R \ u; endfor; toc
>>>>>>>> tic; for i = 1:1000; u' / R; endfor; toc
>>>>>>>> tic; for i = 1:1000; R' \ u; endfor; toc
>>>>>>>>
>>>>>>>> R = tril (rand (n));
>>>>>>>> u = rand (n, 1);
>>>>>>>>
>>>>>>>> tic; for i = 1:1000; R \ u; endfor; toc
>>>>>>>> tic; for i = 1:1000; u' / R; endfor; toc
>>>>>>>> tic; for i = 1:1000; R' \ u; endfor; toc
>>>>>>>>
>>>>>>>> u = u + I*rand (n, 1);
>>>>>>>> tic; for i = 1:1000; R \ u; endfor; toc
>>>>>>>> tic; for i = 1:1000; R' \ u; endfor; toc
>>>>>>>>
>>>>>>>>
>>>>>>>> n = 800;
>>>>>>>> a = rand (n);
>>>>>>>> b = rand (n) + i*rand (n);
>>>>>>>> tic; a * b; toc
>>>>>>>> tic; b * a; toc
>>>>>>>> tic; a' * b; toc
>>>>>>>> tic; b * a'; toc
>>>>>>>> tic; a \ b; toc
>>>>>>>> tic; b / a; toc
>>>>>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>>>>>
>>>>>>>> Octave3.0.1 that comes with Ubuntu Jaunty amd64, I get
>>>>>>>>
>>>>>>>>
>>>>>>>> octave:4> bench
>>>>>>>> Elapsed time is 0.20216 seconds.
>>>>>>>> Elapsed time is 1.93894 seconds.
>>>>>>>> Elapsed time is 2.33824 seconds.
>>>>>>>> Elapsed time is 0.188448 seconds.
>>>>>>>> Elapsed time is 1.95657 seconds.
>>>>>>>> Elapsed time is 2.43552 seconds.
>>>>>>>> Elapsed time is 4.08299 seconds.
>>>>>>>> Elapsed time is 7.84752 seconds.
>>>>>>>> Elapsed time is 0.213021 seconds.
>>>>>>>> Elapsed time is 0.21117 seconds.
>>>>>>>> Elapsed time is 0.218387 seconds.
>>>>>>>> Elapsed time is 0.217174 seconds.
>>>>>>>> Elapsed time is 0.452714 seconds.
>>>>>>>> Elapsed time is 0.391383 seconds.
>>>>>>>> octave:5>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Matlab 2008b gives
>>>>>>>>>> bench
>>>>>>>> Elapsed time is 0.289161 seconds.
>>>>>>>> Elapsed time is 0.566446 seconds.
>>>>>>>> Elapsed time is 0.562623 seconds.
>>>>>>>> Elapsed time is 0.253456 seconds.
>>>>>>>> Elapsed time is 0.574304 seconds.
>>>>>>>> Elapsed time is 0.570281 seconds.
>>>>>>>> Elapsed time is 0.253070 seconds.
>>>>>>>> Elapsed time is 0.572601 seconds.
>>>>>>>> Elapsed time is 0.102086 seconds.
>>>>>>>> Elapsed time is 0.102677 seconds.
>>>>>>>> Elapsed time is 0.103080 seconds.
>>>>>>>> Elapsed time is 0.103759 seconds.
>>>>>>>> Elapsed time is 0.165608 seconds.
>>>>>>>> Elapsed time is 0.181704 seconds.
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Octave 3.2.3+ from today, self compiled, gives
>>>>>>>> octave:1> bench
>>>>>>>> Elapsed time is 0.208794 seconds.
>>>>>>>> Elapsed time is 0.189178 seconds.
>>>>>>>> Elapsed time is 0.186724 seconds.
>>>>>>>> Elapsed time is 0.188649 seconds.
>>>>>>>> Elapsed time is 0.192915 seconds.
>>>>>>>> Elapsed time is 0.19166 seconds.
>>>>>>>> Elapsed time is 0.186277 seconds.
>>>>>>>> Elapsed time is 0.19102 seconds.
>>>>>>>> Elapsed time is 0.212707 seconds.
>>>>>>>> Elapsed time is 0.211013 seconds.
>>>>>>>> Elapsed time is 0.210491 seconds.
>>>>>>>> Elapsed time is 0.210447 seconds.
>>>>>>>> Elapsed time is 0.431791 seconds.
>>>>>>>> Elapsed time is 0.367412 seconds.
>>>>>>>> octave:2>
>>>>>>>>
>>>>>>>> Congratulations!
>>>>>>>> Michael
>>>>>>>>
>>>>>>>
>>>>>>> It's interesting you didn't get any speed-up in the second part of the
>>>>>>> benchmark, compared to 3.0.1...
>>>>>>> What BLAS and LAPACK are you using? What's your compiler configuration?
>>>>>>> Also, what exactly is your tip? The "3.2.3+" is a bit unclear, did you
>>>>>>> mean "3.3.50+", i.e. the development version?
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>> --
>>>>>>> RNDr. Jaroslav Hajek
>>>>>>> computing expert & GNU Octave developer
>>>>>>> Aeronautical Research and Test Institute (VZLU)
>>>>>>> Prague, Czech Republic
>>>>>>> url: www.highegg.matfyz.cz
>>>>>>>
>>>>>>
>>>>>> Oops, sorry, it's 3.3.50+, updated this morning.
>>>>>>
>>>>>> I make using
>>>>>> make -j2 CFLAGS="-O3 -march=native -funroll-loops" FFLAGS="-O3
>>>>>> -march=native -funroll-loops" XTRA_CFLAGS="-O3 -march=native
>>>>>> -funroll-loops" XTRA_CXXFLAGS="-O3 -march=native -funroll-loops"
>>>>>>
>>>>>
>>>>> In general, if you're with a newer gcc on a 64-bit architecture, I
>>>>> advise you against -funroll-loops. For me, it usually gets some +1% of
>>>>> additional speed of some operations, at the cost of increasing the
>>>>> binaries' size by more than 50%. Seems like a bad tradeoff.
>>>>>
>>>>>> ./configure reports
>>>>>>  BLAS libraries:       -llapack -lcblas -lf77blas -latlas
>>>>>>
>>>>>> so I assume that Octave is using Atlas (the atlas dev package that
>>>>>> comes with Kubuntu Jaunty amd64).
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>
>>>>> Apparently, yes. Hmm. It's really weird you got almost exactly the same 
>>>>> figures.
>>>>> If you apply the attached patch, rebuild and re-run the benchmark,
>>>>> what do you get?
>>>>>
>>>>> --
>>>>> RNDr. Jaroslav Hajek
>>>>> computing expert & GNU Octave developer
>>>>> Aeronautical Research and Test Institute (VZLU)
>>>>> Prague, Czech Republic
>>>>> url: www.highegg.matfyz.cz
>>>>>
>>>>
>>>> With that patch applied, I get
>>>> octave:1> bench
>>>> Elapsed time is 0.194493 seconds.
>>>> Elapsed time is 0.192309 seconds.
>>>> Elapsed time is 0.189026 seconds.
>>>> Elapsed time is 0.188679 seconds.
>>>> Elapsed time is 0.195958 seconds.
>>>> Elapsed time is 0.193521 seconds.
>>>> Elapsed time is 0.187596 seconds.
>>>> Elapsed time is 0.193254 seconds.
>>>> Elapsed time is 0.215135 seconds.
>>>> Elapsed time is 0.213705 seconds.
>>>> Elapsed time is 0.21341 seconds.
>>>> Elapsed time is 0.212501 seconds.
>>>> Elapsed time is 0.363992 seconds.
>>>> Elapsed time is 0.368094 seconds.
>>>>
>>>> so there is an improvement in the second to last number.
>>>>
>>>> Cheers, M.
>>>>
>>>
>>> OK, it's funny. I now understand where the problem is. Just change the line
>>>
>>> b = rand (n) + i*rand (n);
>>>
>>> to
>>>
>>> b = rand (n) + I*rand (n);
>>>
>>> (note the big I). At this point, i is still defined from the previous
>>> loops as a real numeric value (!)
>>> And run the benchmarks again. I think this affects Matlab, too.
>>> In any case, it is apparent that your Matlab is linked to something
>>> faster than ATLAS; probably Intel's MKL.
>>>
>>> regards
>>>
>>> --
>>> RNDr. Jaroslav Hajek
>>> computing expert & GNU Octave developer
>>> Aeronautical Research and Test Institute (VZLU)
>>> Prague, Czech Republic
>>> url: www.highegg.matfyz.cz
>>>
>>
>> That's one of those bugs that causes rockets to go off course, I
>> guess!
>
> Yes, definitely. Maybe we could do something about it...
>
>> OK, it makes sense now. Matlab has been available here for a
>> while, but I haven't used it much. I don't know the details of what
>> libraries it uses - it's v2009b.
>
> You can usually tell by inspecting the binaries using ldd. But it
> doesn't matter much.
> I will be glad if you post the results of the corrected benchmark.
>

Btw., I guess that you have two cores, right? If so, it is likely that
your Matlab uses both of them. You can achieve a similar effect by
linking Octave to a multi-threaded ATLAS.

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]