Re: isreal benchmarking

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: isreal benchmarking

From:	Daniel J Sebald
Subject:	Re: isreal benchmarking
Date:	Tue, 11 Sep 2012 17:38:30 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 09/11/2012 04:26 PM, John W. Eaton wrote:

On 11-Sep-2012, Rik wrote:

| Maybe processor architecture makes a difference?  I don't know what sort of
| awesome 8 cores you have and maybe there are CPU pinning effects going on.
| I only have two cores, but I turned off the second core with
|
| echo 0>>  /sys/devices/system/cpu/cpu1/online
| cat /proc/cpuinfo  # just to verify that the cpu is no longer available to
| the kernel
|
| Still no real change in the behavior.
|
| I also tried compiling with just '-O2' and the difference persists.  I
| compiled again with '-g -O0' and the difference is still there.  The
| '-msse' option helped reduce the imag(x) runtimes for me.  Without it the
| difference I am seeing is close to ~40% rather than just 25%.

Some of this discussion was off the list but I'm bringing it back
because I have a few comments that might be of general interest and
I'd like to know whether people think we should attempt compatibility
for the two cases I describe at the end of this message.

Using -g should not slow things down.  I just asked if you are using
-O2 or -g because I usually compile the dev sources without
optimization to make debugging easier, and without optimization
performance is noticeably worse.

Also, I see significantly different results when compiling without
-O2:

   octave-cli:1>  x = complex (1e4, 1e4);
   octave-cli:2>  x = complex (zeros (1e4), zeros (1e4));
   octave-cli:3>  t = cputime (); isreal (x); cputime () - t
   ans = 0
   octave-cli:4>  t = cputime (); isreal ([x]); cputime () - t
   ans =  1.8841
   octave-cli:5>  t = cputime (); isreal (x(:)); cputime () - t
   ans =  1.9081
   octave-cli:6>  t = cputime (); all (all (imag (x) == 0)); cputime () - t
   ans =  2.4922

compared to with (different version of Octave, above was dev, this is
3.6.3, but things in this regard should not have changed much):

   octave:1>  x = complex (zeros (1e4), zeros (1e4));
   octave:2>  t = cputime (); isreal (x); cputime () - t
   ans =  0.0040000
   octave:3>  t = cputime (); isreal ([x]); cputime () - t
   ans =  1.0801
   octave:4>  t = cputime (); isreal (x(:)); cputime () - t
   ans =  1.0681
   octave:5>  t = cputime (); all (all (imag (x) == 0)); cputime () - t
   ans =  1.0881

So one thing to notice is that compiler optimization is critical if
you want to obtain reasonably good performance for Octave.

Your numbers are sort of along the lines that we are seeing. The -O2flag must be the main difference as I see no improvement with the-msse4.1 option that Rik used. Does the -O2 include caching? (Rikthinks Boolean test vectors can be cached.) I can't find an answer forthat search the gcc manual page.

In any case, the results suggest--at least in the general case--there isa performance justification for an "isimag()" as opposed to using"all(all(real(x)==0)", if one wanted to go that route. However,compared to more sophisticated math ops these are pretty fast so such athing is low priority... But if there were simple tricks to use thatwould help gcc utilize SSE, that'd be cool. E.g., I'm imagining runningthrough a loop doing things in pairs then handle the odd/even at the end:


for (int i=0; i < N/2; i+=2) {
  if (imagcomp[i] != 0 || imagecomp[i+1] != 0)
    return false;
}
for (int i=2*(N/2); i < N; i++) {
  if (imagcomp[i] != 0)
    return false;
}

I've no idea if gcc's SSE support could actually benefit from that, butsuch C code would compile for any architecture.

I also didn't realize that we narrowed from complex to real when
indexing with (:).  I'm not sure that is the correct behavior.  It
looks like Matlab does not narrow to real in this case.  Or for [x].
So should we make Octave behave the same way in these cases?  It looks
like it does narrow in other cases.  For example, x+2 or x*2 or other
arithmetic operations will result in real values, not complex values
with zero imaginary part.  Even x+x narrows, so it is a general
property of the result of an arithmetic operation.

In Octave, exp(), log(), sqrt() all end up doing narrowing, which isconsistent with the math-op theme. (It would happen naturally if thesewere script files using basic arithmetic, but I tested because they areinternal operations.)

Dan

[Prev in Thread]

Current Thread

[Next in Thread]

isreal and iscomplex, Rik, 2012/09/10
- Re: isreal and iscomplex, Daniel J Sebald, 2012/09/10
  - Re: isreal benchmarking, Rik, 2012/09/11
    - Re: isreal benchmarking, Daniel J Sebald, 2012/09/11
    - Re: isreal benchmarking, John W. Eaton, 2012/09/11
    - Message not available
    - Message not available
    - Message not available
    - Re: isreal benchmarking, John W. Eaton, 2012/09/11
    - Re: isreal benchmarking, Daniel J Sebald <=
    - Re: isreal benchmarking, John W. Eaton, 2012/09/12
    - Re: isreal benchmarking, Daniel J Sebald, 2012/09/12

Prev by Date: Re: isreal benchmarking
Next by Date: imag () function detail regarding -0.0000
Previous by thread: Re: isreal benchmarking
Next by thread: Re: isreal benchmarking
Index(es):
- Date
- Thread