octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bugs to fix before 4.4.0 release


From: Daniel J Sebald
Subject: Re: bugs to fix before 4.4.0 release
Date: Wed, 3 Jan 2018 16:08:27 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 01/03/2018 02:54 PM, Rik wrote:
In addition to straightforward bugs, I'd like to see the performance not
degrade too much between releases.  I know that this is a trivial test, but
the performance of double-nested for loops shows that performance has been
declining over major releases, and that the development branch is 2.6X
slower than 4.2.1.

Sample Code:

a = 1; b = 1; t0=tic; for i=1:1000; for j=1:1000; a = a + b + 123.0; end;
end; t1=toc(t0); t1

Results:

3.8.2 : 0.84617
4.0.3 : 1.4062
4.2.1 : 1.43
4.4.0-dev : 3.77

Are you doing an apples-to-apples comparison? E.g., all compiled on the same system with the same configuration? It's not the case that one of those is a release build and the release is compile with all optimizations or something like that? Same thing for JIT support?

All I can say is that GUI/cli environment doesn't seem to make a difference and the time I'm seeing for 4.4.0-dev is twice what you are reporting (3.16GHz Xeon).

octave:1> a = 1; b = 1; t0=tic; for i=1:1000; for j=1:1000; a = a + b + 123.0; end;
> end; t1=toc(t0); t1
t1 =  7.3447

You are testing a really basic routine. I wouldn't imagine the that the arithmetic translation has varied too much. Although perhaps the assignment "a =" has gotten worse. Let's try:

octave:5> a = 1; b = 1; t0=tic; for i=1:1000; for j=1:1000; a + b + 123.0; end; end; t1=toc(t0); t1
t1 =  5.8170

A notable improvement, yet it doesn't look like the assignment is a major drain. How about checking loop length (I'll continue without the assignment to sort of remove a factor):

octave:6> a = 1; b = 1; t0=tic; for i=1:1000000; for j=1:1; a + b + 123.0; end; end; t1=toc(t0); t1
t1 =  7.4740

OK, the above suggests something interesting, which is that setting up or initializing that inner loop could be the source of the change. So, I'm going to guess that the other way around is pretty fast (but I guessed wrong):

octave:7> a = 1; b = 1; t0=tic; for i=1:1; for j=1:1000000; a + b + 123.0; end; end; t1=toc(t0); t1
t1 =  5.8180

However, it's the same as the 1000 by 1000 performance.  Strange.

Check this out:

octave:1> for lim_p = 0:6
>   lim1 = 10^lim_p;
>   lim2 = 10^(6-lim_p);
> a = 1; b = 1; t0=tic; for i=1:lim1; for j=1:lim2; a + b + 123.0; end; end; t1=toc(t0); t1
> end
t1 =  5.8178
t1 =  5.8183
t1 =  5.8155
t1 =  5.8219
t1 =  5.8895
t1 =  6.4637
t1 =  11.987

Guess I'd expect that to be a more linear relationship if setting up the second loop is the major drain. As it stands, it suggests that the "a + b + 123.0" portion of this is, in fact, is a major consumption of time. For comparison

octave:10> a = ones(1000000,1);
octave:11> b = ones(1000000,1);
octave:12> t0=tic; a + b + 123.0; t1=toc(t0); t1
t1 =  0.017333

Quite a difference.

Rik, I'm wondering how much the C++ compiler factors into this. The only fair comparison is versions of the code built with the same compiler with the exact same settings.

Dan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]