octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: isreal benchmarking


From: Daniel J Sebald
Subject: Re: isreal benchmarking
Date: Tue, 11 Sep 2012 13:24:17 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 09/11/2012 11:43 AM, Rik wrote:
On 09/10/2012 04:02 PM, Daniel J Sebald wrote:
Incidentally, I did a little benchmarking and automatic narrowing is ~25%
faster than using an equivalent vectorized operator.

Example code:
x = complex (ones (1e4), zeros (1e4));

tic; xreal = isreal ([x]); toc
Elapsed time is 1.78048 seconds.

tic; xreal = all (imag (x) == 0); toc
Elapsed time is 2.42 seconds.

That's not what I'm getting...

octave:1>  x = complex (ones (1e4), zeros (1e4));
octave:2>
octave:2>  cpustart = cputime();
octave:3>  xisreal = isreal(x)
xisreal = 0
octave:4>  cputime() - cpustart
ans =  0.0029990
octave:5>
octave:5>  cpustart = cputime();
octave:6>  xisreal = isreal([x])
xisreal =  1
octave:7>  cputime() - cpustart
ans =  1.5658
octave:8>
octave:8>  cpustart = cputime();
octave:9>  xisreal = isreal(x(:))
xisreal =  1
octave:10>  cputime() - cpustart
ans =  1.5678
octave:11>
octave:11>  cpustart = cputime();
octave:12>  xisreal = all (all (imag (x) == 0))
xisreal =  1
octave:13>  cputime() - cpustart
ans =  1.4528

What brand of computer do you have there Rik?
Hmm.  It's not that bad.  I'm using a mainstream Intel Core2 Duo.

Benchmarking though can be tricky.  Did you disable frequency scaling for
your CPU?

Yes, I know. But it is interesting we get such discrepancy. Could be something with caching as well, e.g., the Octave operations translate in a way that can utilize a big cache.

Well, let's think about what this does, maybe something can be learned in the end. That 1.8s to 2.4s change is quite significant, must be something going on there. None of these three operations:

xisreal = isreal([x])
xisreal = isreal(x(:))
xisreal = all (all (imag (x) == 0))

appears to need any sort of intermediate variable to be executed. I've watched my system memory in all cases and there is no blip in memory of any sort. (I waited until after the initial ones/zeros assignment of x stabilized.) Is there some way to check this on your system?

The first example, isreal([x]), needs to narrow (i.e., run through matrix checking that all imaginary are zero) then should be able to answer whether isreal() in just a few instructions. The second example, isreal(x(:)), also does narrowing and vectorization, but vectorization can be merely a case of indexing through memory slightly different. But since neither of these needs to compute an intermediate variable, my suspicion is that after parsing/optimizing the two commands boil down to the same Octave internal process. The third example really doesn't do much other than to loop through the matrix checking that the imaginary part is zero, pretty much the same as the other two, but it does it more directly. There is the fact that "all (imag(x) == 0)" should create a vector, but if the parser/optimizer is really good, it could be avoided. Let's see what happens if I explicitly create an intermediate vector:

octave:14> clear y;
octave:15>
octave:15> tic
octave:16> cpustart = cputime();
octave:17> xisreal = all (all (imag (x) == 0))
xisreal =  1
octave:18> cputime() - cpustart
ans =  1.5058
octave:19> toc
Elapsed time is 1.518 seconds.
octave:20>
octave:20> tic
octave:21> cpustart = cputime();
octave:22> y = all (imag (x) == 0);
octave:23> xisreal = all (y == 1)
xisreal =  1
octave:24> cputime() - cpustart
ans =  1.4778
octave:25> toc
Elapsed time is 1.491 seconds.

Interesting, no difference. Perhaps Octave is creating a temp vector in memory, but it's just that on my system this memory assignment is blazing fast. (The memory assignment would be reflected in the wall-timer, but not CPU time.) I've plenty of memory so the system doesn't need to do much to make room for "y = all (imag (x) == 0)".


  If not, the first tests impose a load on the CPU which then
responds by increasing its operating frequency and later tests run faster.

/proc/cpuinfo shows so many flags, I don't know what's what.

I can reverse the order to test your theory:

octave:8> cpustart = cputime();
octave:9> xisreal = all (all (imag (x) == 0))
xisreal =  1
octave:10> cputime() - cpustart
ans =  1.4138
octave:11>
octave:11> cpustart = cputime();
octave:12> xisreal = isreal([x])
xisreal =  1
octave:13> cputime() - cpustart
ans =  1.5258
octave:14>
octave:14> cpustart = cputime();
octave:15> xisreal = isreal(x(:))
xisreal =  1
octave:16> cputime() - cpustart
ans =  1.5258

What's funny is that although relatively the comparison is the same, there seems to be a couple percent speed improvement placing the "all (all (imag (x) == 0))" test first.


There wasn't a sudden change in system load during the test was there such
as a cron job coming to life?

cputime() should weed that out, but I'm getting the same results with tic/toc

octave:1> x = complex (ones (1e4), zeros (1e4));
octave:2>
octave:2> tic; xreal = isreal ([x]); toc
Elapsed time is 1.58903 seconds.
octave:3>
octave:3> tic; xreal = all (imag (x) == 0); toc
Elapsed time is 1.45799 seconds.


I just re-benchmarked using your exact code and I still get the result that
narrowing is faster.  This is on a development build from about a week ago.

Please try using cputime() and the tic/toc wall clock.

Dan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]