[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bugs to fix before 4.4.0 release
From: |
Daniel J Sebald |
Subject: |
Re: bugs to fix before 4.4.0 release |
Date: |
Wed, 3 Jan 2018 16:08:27 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 |
On 01/03/2018 02:54 PM, Rik wrote:
In addition to straightforward bugs, I'd like to see the performance not
degrade too much between releases. I know that this is a trivial test, but
the performance of double-nested for loops shows that performance has been
declining over major releases, and that the development branch is 2.6X
slower than 4.2.1.
Sample Code:
a = 1; b = 1; t0=tic; for i=1:1000; for j=1:1000; a = a + b + 123.0; end;
end; t1=toc(t0); t1
Results:
3.8.2 : 0.84617
4.0.3 : 1.4062
4.2.1 : 1.43
4.4.0-dev : 3.77
Are you doing an apples-to-apples comparison? E.g., all compiled on the
same system with the same configuration? It's not the case that one of
those is a release build and the release is compile with all
optimizations or something like that? Same thing for JIT support?
All I can say is that GUI/cli environment doesn't seem to make a
difference and the time I'm seeing for 4.4.0-dev is twice what you are
reporting (3.16GHz Xeon).
octave:1> a = 1; b = 1; t0=tic; for i=1:1000; for j=1:1000; a = a + b +
123.0; end;
> end; t1=toc(t0); t1
t1 = 7.3447
You are testing a really basic routine. I wouldn't imagine the that the
arithmetic translation has varied too much. Although perhaps the
assignment "a =" has gotten worse. Let's try:
octave:5> a = 1; b = 1; t0=tic; for i=1:1000; for j=1:1000; a + b +
123.0; end; end; t1=toc(t0); t1
t1 = 5.8170
A notable improvement, yet it doesn't look like the assignment is a
major drain. How about checking loop length (I'll continue without the
assignment to sort of remove a factor):
octave:6> a = 1; b = 1; t0=tic; for i=1:1000000; for j=1:1; a + b +
123.0; end; end; t1=toc(t0); t1
t1 = 7.4740
OK, the above suggests something interesting, which is that setting up
or initializing that inner loop could be the source of the change. So,
I'm going to guess that the other way around is pretty fast (but I
guessed wrong):
octave:7> a = 1; b = 1; t0=tic; for i=1:1; for j=1:1000000; a + b +
123.0; end; end; t1=toc(t0); t1
t1 = 5.8180
However, it's the same as the 1000 by 1000 performance. Strange.
Check this out:
octave:1> for lim_p = 0:6
> lim1 = 10^lim_p;
> lim2 = 10^(6-lim_p);
> a = 1; b = 1; t0=tic; for i=1:lim1; for j=1:lim2; a + b + 123.0;
end; end; t1=toc(t0); t1
> end
t1 = 5.8178
t1 = 5.8183
t1 = 5.8155
t1 = 5.8219
t1 = 5.8895
t1 = 6.4637
t1 = 11.987
Guess I'd expect that to be a more linear relationship if setting up the
second loop is the major drain. As it stands, it suggests that the "a +
b + 123.0" portion of this is, in fact, is a major consumption of time.
For comparison
octave:10> a = ones(1000000,1);
octave:11> b = ones(1000000,1);
octave:12> t0=tic; a + b + 123.0; t1=toc(t0); t1
t1 = 0.017333
Quite a difference.
Rik, I'm wondering how much the C++ compiler factors into this. The
only fair comparison is versions of the code built with the same
compiler with the exact same settings.
Dan
- Re: bugs to fix before 4.4.0 release, Rik, 2018/01/03
- Re: bugs to fix before 4.4.0 release,
Daniel J Sebald <=
- Re: bugs to fix before 4.4.0 release, Rik, 2018/01/03
- Re: bugs to fix before 4.4.0 release, Michael D Godfrey, 2018/01/03
- Re: bugs to fix before 4.4.0 release, ederag, 2018/01/04
- Re: bugs to fix before 4.4.0 release, John W. Eaton, 2018/01/04
- Re: bugs to fix before 4.4.0 release, Carlo De Falco, 2018/01/04
- Re: loop performance, Rik, 2018/01/04
- Re: bugs to fix before 4.4.0 release, Andreas Weber, 2018/01/04
- Re: bugs to fix before 4.4.0 release, ederag, 2018/01/04
- Re: bugs to fix before 4.4.0 release, Andreas Weber, 2018/01/04
- building older Octave with GCC 6 (was: bugs to fix before 4.4.0 release), Mike Miller, 2018/01/04