[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: For loop benchmarks between Octave versions
From: |
Daniel J Sebald |
Subject: |
Re: For loop benchmarks between Octave versions |
Date: |
Wed, 25 Jun 2014 02:54:39 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16 |
On 06/25/2014 01:35 AM, Julien Bect wrote:
Le 17/06/2014 02:39, Rik a écrit :
I find that this changeset:
12920:5d18231eee00 Extend profiling support to operators.
produces a 50% slow down when using the for loop benchmark. Unless we
can figure out a lower footprint way to do profiling we might be stuck
with this part of the slowdown. There is still another 50% which seems
to be due to other accumulated cruft, i.e., I didn't find a smoking
gun changeset.
I have built revision 12922 (Thu, 04 Aug 2011, default branch, just
after the above-mentioned changeset)with my "inlining +template" patch
for the profiler.
My benchmarks indicate that this patched revision is even slightly
better than Octave 3.4.3 :
Octave 3.4.3 r12922 + patch
doubleForLoop/1 114.9 [ 0.9] 108.4 [ 1.9] evol: -5.6%
doubleForLoop/a=1 165.5 [ 0.6] 154.5 [ 1.8] evol: -6.6%
doubleForLoop/a=b 190.2 [ 0.7] 169.2 [ 0.9] evol: -11.0%
doubleForLoop/a=a+b 171.5 [ 1.2] 143.1 [ 1.1] evol: -16.6%
doubleForLoop/a=sin(b*i) 224.6 [ 1.6] 207.1 [ 2.9] evol: -7.8%
vectorOfSquares 326.8 [ 4.0] 336.5 [ 9.4] evol: +2.9%
mandel 126.4 [ 1.6] 120.7 [ 1.1] evol: -4.6%
fib 336.2 [ 1.1] 335.8 [ 1.1] evol: -0.1%
parse_int 1414.1 [ 16.0] 1377.1 [ 24.2] evol: -2.6%
quicksort 632.8 [ 11.3] 622.7 [ 10.9] evol: -1.6%
pi_sum 8678.9 [ 84.2] 7452.6 [ 27.6] evol: -14.1%
rand_mat_stat 315.9 [ 10.8] 309.1 [ 8.5] evol: -2.1%
rand_mat_mul 387.3 [ 20.1] 363.2 [ 20.1] evol: -6.2%
printfd 2404.7 [ 30.9] 2310.7 [ 5.7] evol: -3.9%
This suggests that the additional slowdown between those revision and
now has nothing to do with the profiler (?).
Any idea about where to look next ?
I know little about the parser, but I looked around a bit to see if I
could get a feel for things to answer the question "where to look". I
think it might be better to just work from the development version and
look for ways to optimize code. If you have a working profiler, that
might indicate where most of the time is being spent so you can start
looking there.
I say this because I'm seeing with some simple expressions in for-loops
what I will call "redundant evaluations". In particular, there is this
octave_value_list
tree_identifier::rvalue (int nargout,
const std::list<octave_lvalue> *lvalue_list)
{
}
function which is called to verify the expression to be used is a valid
variable or a valid function. I put an "fprintf(stderr,"HI\n")" within
that expression to see how often it is called with, say,
octave:6> for i=1:10
a=b;
end
HI
HI
HI
HI
HI
HI
HI
HI
HI
HI
So, every time this loop is iterated, there is a somewhat lengthy series
of tests to determine the status of variable/function "b". Now,
granted, such tests might be necessary per iteration under certain
circumstances. For example, if the "print_result()" is true then
rvalue() must be called. That would be the case of:
octave:7> for i=1:10
a=b
end
HI
HI
HI
HI
HI
HI
HI
HI
HI
HI
a = 1
a = 1
a = 1
a = 1
a = 1
a = 1
a = 1
a = 1
a = 1
a = 1
Or if there is something that might change the nature of a
variable/function then that needs to be done on a per-iteration basis.
For example:
octave:8> for i=1:10
a=b;
if (i==3)
clear b;
end
end
HI [44 times, why I don't know.]
error: 'b' undefined near line 2 column 5
HI
I would think that optimizations of sorts could be introduced by
prescreening for things like
1) No printing required anywhere within loop
2) No clearing of variable space
If one knew that, then a lot of the tests on the variable/function are
redundant. It's not inconceivable to have different types of looping
constructs, say, a really fast, unchecked type of looping and a slower,
checked type of looping. If someone is printing results on an iteration
basis, the slower loop isn't the time critical element; the printing is
the slowest element.
What I'm getting at here is that I can see there are some optimizing
things to be done, but of course one has to be careful with this and do
testing of any changes. But I think because of the complexity of things
it is probably wiser to start from the latest code than it is to, first,
go back and find what version it is where things slowed down and,
second, try to undo what might have been done at that point because
there's a big risk of introducing interpreter bugs.
Dan
- Re: For loop benchmarks between Octave versions, (continued)
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/17
- Re: For loop benchmarks between Octave versions, Daniel J Sebald, 2014/06/17
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/22
- Re: For loop benchmarks between Octave versions, PrasannaKumar Muralidharan, 2014/06/23
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/23
- Re: For loop benchmarks between Octave versions, Daniel J Sebald, 2014/06/23
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/24
- Re: For loop benchmarks between Octave versions, Daniel J Sebald, 2014/06/24
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/18
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/25
- Re: For loop benchmarks between Octave versions,
Daniel J Sebald <=
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/25
- Re: For loop benchmarks between Octave versions, Julien Bect, 2014/06/26
Re: For loop benchmarks between Octave versions, PrasannaKumar Muralidharan, 2014/06/17
Re: For loop benchmarks between Octave versions, Daniel J Sebald, 2014/06/18
Re: For loop benchmarks between Octave versions, Carnë Draug, 2014/06/23