octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: isreal benchmarking


From: John W. Eaton
Subject: Re: isreal benchmarking
Date: Tue, 11 Sep 2012 17:26:30 -0400

On 11-Sep-2012, Rik wrote:

| Maybe processor architecture makes a difference?  I don't know what sort of
| awesome 8 cores you have and maybe there are CPU pinning effects going on. 
| I only have two cores, but I turned off the second core with
| 
| echo 0 >> /sys/devices/system/cpu/cpu1/online
| cat /proc/cpuinfo  # just to verify that the cpu is no longer available to
| the kernel
| 
| Still no real change in the behavior.
| 
| I also tried compiling with just '-O2' and the difference persists.  I
| compiled again with '-g -O0' and the difference is still there.  The
| '-msse' option helped reduce the imag(x) runtimes for me.  Without it the
| difference I am seeing is close to ~40% rather than just 25%.

Some of this discussion was off the list but I'm bringing it back
because I have a few comments that might be of general interest and
I'd like to know whether people think we should attempt compatibility
for the two cases I describe at the end of this message.

Using -g should not slow things down.  I just asked if you are using
-O2 or -g because I usually compile the dev sources without
optimization to make debugging easier, and without optimization
performance is noticeably worse.

Also, I see significantly different results when compiling without
-O2:

  octave-cli:1> x = complex (1e4, 1e4);
  octave-cli:2> x = complex (zeros (1e4), zeros (1e4));
  octave-cli:3> t = cputime (); isreal (x); cputime () - t
  ans = 0
  octave-cli:4> t = cputime (); isreal ([x]); cputime () - t
  ans =  1.8841
  octave-cli:5> t = cputime (); isreal (x(:)); cputime () - t
  ans =  1.9081
  octave-cli:6> t = cputime (); all (all (imag (x) == 0)); cputime () - t
  ans =  2.4922

compared to with (different version of Octave, above was dev, this is
3.6.3, but things in this regard should not have changed much):

  octave:1> x = complex (zeros (1e4), zeros (1e4));
  octave:2> t = cputime (); isreal (x); cputime () - t
  ans =  0.0040000
  octave:3> t = cputime (); isreal ([x]); cputime () - t
  ans =  1.0801
  octave:4> t = cputime (); isreal (x(:)); cputime () - t
  ans =  1.0681
  octave:5> t = cputime (); all (all (imag (x) == 0)); cputime () - t
  ans =  1.0881

So one thing to notice is that compiler optimization is critical if
you want to obtain reasonably good performance for Octave.

I doubt that multiple CPUs has anything to do with this.  Octave is
not multithreaded and I don't think these operations don't rely on the
BLAS where threading might be enabled.  These are just loops in the
interpreter/array classes.

I also didn't realize that we narrowed from complex to real when
indexing with (:).  I'm not sure that is the correct behavior.  It
looks like Matlab does not narrow to real in this case.  Or for [x].
So should we make Octave behave the same way in these cases?  It looks
like it does narrow in other cases.  For example, x+2 or x*2 or other
arithmetic operations will result in real values, not complex values
with zero imaginary part.  Even x+x narrows, so it is a general
property of the result of an arithmetic operation.  It doesn't appear
that there is a special check about adding two complex values that
have all imaginary parts or some other trickery like that.

jwe


reply via email to

[Prev in Thread] Current Thread [Next in Thread]