Re: [OctDev] Question on performance, coding style and competitive soft

octave-maintainers
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [OctDev] Question on performance, coding style and competitive soft

From:	Jaroslav Hajek
Subject:	Re: [OctDev] Question on performance, coding style and competitive software
Date:	Thu, 23 Apr 2009 12:24:20 +0200
On Thu, Apr 23, 2009 at 11:18 AM, Alois Schlögl
<address@hidden> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jaroslav Hajek wrote:
>> On Wed, Apr 22, 2009 at 4:18 PM, Alois Schlögl <address@hidden> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> David Bateman wrote:
>>>> Alois Schlögl wrote:
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>>
>>>>> As some of you might know, my other pet project besides Octave, is
>>>>> BioSig http://biosig.sf.net. BioSig is designed in such a way that it
>>>>> can be used with both, Matlab and Octave. Mostly for performance reason,
>>>>> we cannot abandon support for Matlab [1,2]. Octave is a viable
>>>>> alternative in case the computational performance is not important. In
>>>>> order to decide on the future strategy of BioSig, I hope to get answers
>>>>> on the following questions:
>>>>>
>>>>> 1) Core development of Octave:
>>>>> At the meeting of Octave developer in 2006, the issue was raised
>>>>> that the Octave is about 4 to 5 times slower than Matlab [1]. (I
>>>>> repeated the tests just recently, the results are attached below, and
>>>>> show a difference of factors up to 13, average ~5) This issue is most
>>>>> relevant for large computational jobs, were it makes a difference
>>>>> whether a specific task takes 1 day or 5 days. Is anyone working to
>>>>> address this problem? Is there any hope that the performance penalty
>>>>> becomes smaller or will go away within a reasonable amount of time ?
>>>>>
>>>> Its hard to tell what the source of your speed issues are.. The flippant
>>>> response would be that with a JIT in octave then yes we could be as
>>>> fast, we just need someone to write it. I suspect something will be done
>>>> here in the future. The recent changes of John to have an evaluator
>>>> class and his statement of adding a profiler in Octave 3.4 mean that the
>>>> machinery needed to add a JIT will be in place.
>>>
>>> Good to know that someone is working on this. However, as far as I
>>> understand its currently not possible to estimate when the performance
>>> penalty is expected to be nullified.
>>>
>>
>> Agreed. And I am more pessimistic than David about JIT in near future,
>> unless someone gets funding for that (maybe via GSoC or something).
>>
>>>> However looking at your wackerman its not clear to me that its your
>>>> for-loop that is taking all of the time in Octave. If it is have you
>>>> considered rewriting
>>>>
>>>> for k = 1:size(m2,1),
>>>>  if all(finite(m2(k,:))),
>>>>    L = eig(reshape(m2(k,:), [K,K]));
>>>>    L = L./sum(L);
>>>>    if all(L>0)
>>>>      OMEGA(k) = -sum(L.*log(L));
>>>>    end;
>>>>  end;
>>>> end;
>>>>
>>>> with something like
>>>>
>>>> rows_m2 = size(m2, 1);
>>>> m3 = permute (reshape (m2, [rows_m2, K, K]), [2, 3, 1]);
>>>> idx = all (finite (m2), 1);
>>>> t = cellfun (@(x) eig(x), mat2cell (m3 (:, :, idx), K, K, ones(1,
>>>> rows_m2)),
>>>>             'UniformOutput', false);
>>>> t = cellfun (@(x) - sum (x .* log (x)),
>>>>        cellfun (@(x) x ./ sum(x), 'UniformOutput', false));
>>>> t(iscomplex(t)) = NaN;
>>>> OMEGA(idx) = t;
>>>>
>>>> The code above is of course untested. But in the latest tip that should
>>>> be much faster for Octave as Jaroslav optimized cellfun recently
>>>
>>> Using Jaroslav's code and some modifications (diag of 300000 element
>>> vector was just too large)
>>
>> In 3.1.5x this is no longer an issue, because diagonal matrices are
>> optimized. In 3.0.x, I think you can use "dmult" to do the row
>> scaling. Or the outer product trick you do below, but the diag
>> expression is both more readable and faster in 3.1.5x. Sorry, I just
>> tend to think in terms of the development version :)
>
>
> Thanks for the solution. The problem was with matlab. You might ask why
> I bother You with this, its just that I do not want to ignore mat-users.
>
>>
>>>        rows_m2 = size(m2, 1);
>>>        m3 = permute (reshape (m2, [rows_m2, K, K]), [2, 3, 1]);
>>>        idx = all (isfinite (m2), 2);
>>>        t = cellfun (@eig, mat2cell (m3 (:, :, idx), K, K, ones(1,
>>> sum(idx))),'UniformOutput', false);
>>>        t = [t{:}];
>>>        idx2 = all(t>0);
>>>        t = t(:,idx2) ./ [ones(K,1) * sum(t(:,idx2))];
>>>        t = sum (t .* log (t));
>>>        idx = find(idx);
>>>        OMEGA(idx(idx2)) = t;
>>>
>>> the performance increases for Octave from 82.9 to 15.2 s. Thanks.
>>> (The programm slowed down on Matlab from 13.0 to 66.15 s, though).
>>
>> That's quite surprising. Are you sure you didn't leave the "diag"
>> expression in the Matlab test? I don't see why it should get that
>> slower...
>>
>>> I'm not sure how this technique can be used for the other functions
>>> (aar, findclassifier).
>>
>> Maybe a different technique will work, I haven't yet looked. There are
>> of course also codes that can't be vectorized.
>>
>>> Memory usage is also an issue.
>>
>> Not that much, I hope. Also note 3.1.5x does manage memory more
>> efficiently, apparently even more efficiently than Matlab. Anyway
>> Octave (and Matlab) is not really a good tool for memory-critical
>> applications, because the COW mechanism is very ill-suited for such
>> applications. You definitely want references or pointers if you need
>> to keep memory low.
>>
>>>>> 2) Coding style:
>>>>> Octave understands a superset of commands compared to matlab, and it
>>>>> seems the current policy is to enforce the "octave style" and make the
>>>>> use of toolboxes incompatible for a use with Matlab. Is not it sensible
>>>>> to write platform-neutral applications ? Specifically, is not it in our
>>>>> own interest (wider usage make the code more robust) that matlab users
>>>>> are not "forced" to buy additional toolboxes but can use open source
>>>>> toolboxes e.g. from octave-forge?
>>>>>
>>>> I'd personally consider that up to the toolboxes author. Using texinfo
>>>> in the help string makes the Octave help string "nicer"....  I however
>>>> don't think a policy should be made that toolboxes on octave-forge
>>>> should be matlab compatible..
>>>>
>>>
>>> I know its up to the toolbox authors. I'm not sure that every author is
>>> aware of this. In case someone wants to modify some functions from
>>> octave-forge/main for the use with matlab, and make it available to
>>> others, what is the proper procedure for this (a) if he is the original
>>> author and the function is already in octave-forge/main (b) if he wants
>>> to modify an existing function from some other author ?
>>
>> If he wants to keep that function in the package, then (obviously) he
>> should follow the package's policy (determined by author or
>> maintainer). If he just wants to share it on his own, then he should
>> feel free to do any changes he wishes, as long as he honors the GPL.
>>
>>> The texinfo is the minor problem, because the function is still usable
>>> even if the documentation is not properly displayed.
>>> The main issues are the incompatible syntax like
>>> - - comments: # vs. %
>>> - - end vs. endif-endfor-endwhile-endfunction etc.,
>>> - - single quote  vs. double quote
>>> - - negation operator: ! vs ~
>>> which make it impossible to use most octave toolboxes in Matlab
>>>
>>> BTW, what are the arguments in favor of using octave-only coding style ?
>>>
>>
>> comments: # is much more common. % is, AFAIK, recognized only by
>> Octave and other Matlabish software and TeX.
>> Also, on UNIX # allows to use the #! mechanism and thus make
>> executable octave scripts.
>
> There are all kinds of comments //, /* */, and because Shell and Octave
> scripts are are two different things, this is important.
>
> Cases using the shebang mechanism would certainly need some attention.
> However, within all m-files at octave-forge, only
> octave-forge/main/info-theory/doc/info-theory.m
> is using the shebang mechanism.

The shebang mechanism is just a reason for Octave recognizing # as a
comment, not a reason why authors use it so widely.

>>
>> specific end blocks: they catch typing errors more easily, and the
>> code is more easily parsable for both humans and computers. I also
>> consider it an extremely bad idea that "end" is likewise used in index
>> expressions. I think Cleve Moler (or whoever designed it) must have
>> been drinking that night.
>
>
> The idea of the end-operator is also used in other languages (python,
> etc), so I guess it's not completely insane. After some reluctance, I
> found the end-operator very useful.
>

In python, there is no operator AFAIK, just omitting the subscript.
But you didn't understand: the idea of the end operator is OK per se,
it's just crazy that it matches the keyword for end of block. It makes
things harder for parsers and autoindenting in editors.


>>
>> quotes: again, double quotes are somewhat more standard, in particular
>> in the C-derived world. more importantly, ""s allow things like \n,
>> \t.
>
> Octave does not claim to be compatible to C but to Matlab. \n and \t can
> be also used with single quotes in Octave as well as in Matlab.
>

Octave is, above all, a GNU software and besides Matlab it should try
to fit into the GNU world, where escape chars are quite common.
'\n' in Matlab works only inside printf or scanf, otherwise it just
produces ['\','n']. That's quite messy (and can have unexpected
consequences), it's good that "" avoids it.

>>
>
> It would be nice, if developers aiming at compatibility between octave
> and matlab could feel at home here.
>

Well, compatibility is always painful. I think Octave does a lot for
being Matlab compatible, but we surely don't want to ditch every good
idea just because it didn't occur to MathWorks. Maybe you can ask
MathWorks to improve their compatibility with Octave :)

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [OctDev] Question on performance, coding style and competitive software, (continued)
- Re: Question on performance, coding style and competitive software, Jaroslav Hajek, 2009/04/22
Prev by Date: Re: [OctDev] Question on performance, coding style and competitive software
Next by Date: Re: [CHANGESET] updated print.m
Previous by thread: Re: [OctDev] Question on performance, coding style and competitive software
Next by thread: Re: Question on performance, coding style and competitive software
Index(es):
- Date
- Thread