octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: For loop benchmarks between Octave versions


From: Daniel J Sebald
Subject: Re: For loop benchmarks between Octave versions
Date: Wed, 25 Jun 2014 02:54:39 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 06/25/2014 01:35 AM, Julien Bect wrote:
Le 17/06/2014 02:39, Rik a écrit :
I find that this changeset:

12920:5d18231eee00 Extend profiling support to operators.

produces a 50% slow down when using the for loop benchmark. Unless we
can figure out a lower footprint way to do profiling we might be stuck
with this part of the slowdown. There is still another 50% which seems
to be due to other accumulated cruft, i.e., I didn't find a smoking
gun changeset.

I have built revision 12922 (Thu, 04 Aug 2011, default branch, just
after the above-mentioned changeset)with my "inlining +template" patch
for the profiler.

My benchmarks indicate that this patched revision is even slightly
better than Octave 3.4.3 :

Octave 3.4.3 r12922 + patch
doubleForLoop/1 114.9 [ 0.9] 108.4 [ 1.9] evol: -5.6%
doubleForLoop/a=1 165.5 [ 0.6] 154.5 [ 1.8] evol: -6.6%
doubleForLoop/a=b 190.2 [ 0.7] 169.2 [ 0.9] evol: -11.0%
doubleForLoop/a=a+b 171.5 [ 1.2] 143.1 [ 1.1] evol: -16.6%
doubleForLoop/a=sin(b*i) 224.6 [ 1.6] 207.1 [ 2.9] evol: -7.8%
vectorOfSquares 326.8 [ 4.0] 336.5 [ 9.4] evol: +2.9%
mandel 126.4 [ 1.6] 120.7 [ 1.1] evol: -4.6%
fib 336.2 [ 1.1] 335.8 [ 1.1] evol: -0.1%
parse_int 1414.1 [ 16.0] 1377.1 [ 24.2] evol: -2.6%
quicksort 632.8 [ 11.3] 622.7 [ 10.9] evol: -1.6%
pi_sum 8678.9 [ 84.2] 7452.6 [ 27.6] evol: -14.1%
rand_mat_stat 315.9 [ 10.8] 309.1 [ 8.5] evol: -2.1%
rand_mat_mul 387.3 [ 20.1] 363.2 [ 20.1] evol: -6.2%
printfd 2404.7 [ 30.9] 2310.7 [ 5.7] evol: -3.9%


This suggests that the additional slowdown between those revision and
now has nothing to do with the profiler (?).

Any idea about where to look next ?

I know little about the parser, but I looked around a bit to see if I could get a feel for things to answer the question "where to look". I think it might be better to just work from the development version and look for ways to optimize code. If you have a working profiler, that might indicate where most of the time is being spent so you can start looking there.

I say this because I'm seeing with some simple expressions in for-loops what I will call "redundant evaluations". In particular, there is this

octave_value_list
tree_identifier::rvalue (int nargout,
                         const std::list<octave_lvalue> *lvalue_list)
{
}

function which is called to verify the expression to be used is a valid variable or a valid function. I put an "fprintf(stderr,"HI\n")" within that expression to see how often it is called with, say,

octave:6> for i=1:10
  a=b;
end
HI
HI
HI
HI
HI
HI
HI
HI
HI
HI

So, every time this loop is iterated, there is a somewhat lengthy series of tests to determine the status of variable/function "b". Now, granted, such tests might be necessary per iteration under certain circumstances. For example, if the "print_result()" is true then rvalue() must be called. That would be the case of:

octave:7> for i=1:10
  a=b
end
HI
HI
HI
HI
HI
HI
HI
HI
HI
HI
a =  1
a =  1
a =  1
a =  1
a =  1
a =  1
a =  1
a =  1
a =  1
a =  1

Or if there is something that might change the nature of a variable/function then that needs to be done on a per-iteration basis. For example:

octave:8> for i=1:10
  a=b;
  if (i==3)
    clear b;
  end
end
HI [44 times, why I don't know.]
error: 'b' undefined near line 2 column 5
HI

I would think that optimizations of sorts could be introduced by prescreening for things like

1) No printing required anywhere within loop
2) No clearing of variable space

If one knew that, then a lot of the tests on the variable/function are redundant. It's not inconceivable to have different types of looping constructs, say, a really fast, unchecked type of looping and a slower, checked type of looping. If someone is printing results on an iteration basis, the slower loop isn't the time critical element; the printing is the slowest element.

What I'm getting at here is that I can see there are some optimizing things to be done, but of course one has to be careful with this and do testing of any changes. But I think because of the complexity of things it is probably wiser to start from the latest code than it is to, first, go back and find what version it is where things slowed down and, second, try to undo what might have been done at that point because there's a big risk of introducing interpreter bugs.

Dan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]