octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: isreal benchmarking


From: Daniel J Sebald
Subject: Re: isreal benchmarking
Date: Tue, 11 Sep 2012 17:38:30 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 09/11/2012 04:26 PM, John W. Eaton wrote:
On 11-Sep-2012, Rik wrote:

| Maybe processor architecture makes a difference?  I don't know what sort of
| awesome 8 cores you have and maybe there are CPU pinning effects going on.
| I only have two cores, but I turned off the second core with
|
| echo 0>>  /sys/devices/system/cpu/cpu1/online
| cat /proc/cpuinfo  # just to verify that the cpu is no longer available to
| the kernel
|
| Still no real change in the behavior.
|
| I also tried compiling with just '-O2' and the difference persists.  I
| compiled again with '-g -O0' and the difference is still there.  The
| '-msse' option helped reduce the imag(x) runtimes for me.  Without it the
| difference I am seeing is close to ~40% rather than just 25%.

Some of this discussion was off the list but I'm bringing it back
because I have a few comments that might be of general interest and
I'd like to know whether people think we should attempt compatibility
for the two cases I describe at the end of this message.

Using -g should not slow things down.  I just asked if you are using
-O2 or -g because I usually compile the dev sources without
optimization to make debugging easier, and without optimization
performance is noticeably worse.

Also, I see significantly different results when compiling without
-O2:

   octave-cli:1>  x = complex (1e4, 1e4);
   octave-cli:2>  x = complex (zeros (1e4), zeros (1e4));
   octave-cli:3>  t = cputime (); isreal (x); cputime () - t
   ans = 0
   octave-cli:4>  t = cputime (); isreal ([x]); cputime () - t
   ans =  1.8841
   octave-cli:5>  t = cputime (); isreal (x(:)); cputime () - t
   ans =  1.9081
   octave-cli:6>  t = cputime (); all (all (imag (x) == 0)); cputime () - t
   ans =  2.4922

compared to with (different version of Octave, above was dev, this is
3.6.3, but things in this regard should not have changed much):

   octave:1>  x = complex (zeros (1e4), zeros (1e4));
   octave:2>  t = cputime (); isreal (x); cputime () - t
   ans =  0.0040000
   octave:3>  t = cputime (); isreal ([x]); cputime () - t
   ans =  1.0801
   octave:4>  t = cputime (); isreal (x(:)); cputime () - t
   ans =  1.0681
   octave:5>  t = cputime (); all (all (imag (x) == 0)); cputime () - t
   ans =  1.0881

So one thing to notice is that compiler optimization is critical if
you want to obtain reasonably good performance for Octave.

Your numbers are sort of along the lines that we are seeing. The -O2 flag must be the main difference as I see no improvement with the -msse4.1 option that Rik used. Does the -O2 include caching? (Rik thinks Boolean test vectors can be cached.) I can't find an answer for that search the gcc manual page.

In any case, the results suggest--at least in the general case--there is a performance justification for an "isimag()" as opposed to using "all(all(real(x)==0)", if one wanted to go that route. However, compared to more sophisticated math ops these are pretty fast so such a thing is low priority... But if there were simple tricks to use that would help gcc utilize SSE, that'd be cool. E.g., I'm imagining running through a loop doing things in pairs then handle the odd/even at the end:

for (int i=0; i < N/2; i+=2) {
  if (imagcomp[i] != 0 || imagecomp[i+1] != 0)
    return false;
}
for (int i=2*(N/2); i < N; i++) {
  if (imagcomp[i] != 0)
    return false;
}

I've no idea if gcc's SSE support could actually benefit from that, but such C code would compile for any architecture.


I also didn't realize that we narrowed from complex to real when
indexing with (:).  I'm not sure that is the correct behavior.  It
looks like Matlab does not narrow to real in this case.  Or for [x].
So should we make Octave behave the same way in these cases?  It looks
like it does narrow in other cases.  For example, x+2 or x*2 or other
arithmetic operations will result in real values, not complex values
with zero imaginary part.  Even x+x narrows, so it is a general
property of the result of an arithmetic operation.

In Octave, exp(), log(), sqrt() all end up doing narrowing, which is consistent with the math-op theme. (It would happen naturally if these were script files using basic arithmetic, but I tested because they are internal operations.)

Dan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]