octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gcc 3.4 and Octave/lapack problems


From: Jskud
Subject: Re: gcc 3.4 and Octave/lapack problems
Date: Thu, 11 Nov 2004 21:24:51 -0800

Reading the gcc documentation, one finds that the -ffloat-store is an
expensive hack to attempt to avoid the excess precision in the floating
pointer registers on Intel FPUs.  Based on recent experience with
DONLP2, a noteworthy nonlinear solver, using -ffloat-store everywhere is
an unnecessarily costly workaround.

The problem we saw in DONLP2, and I suspect in the numerical subroutines
used by Octave, is that they automatically calculate the machine
floating pointer characteristics.  When floating pointer numbers are
left in Intel FPU registers during those calculations, they have extra
precision (80 bits, I think), instead of the 64 bits that doubles have
in memory.  Therefore, the calculated values of vital constants like
"machine epsilon" are wrong.

The problem is that -ffloat-store can really slow things down.  Dmitri
observed a 20% slowdown in Octave using -ffloat-store everywhere.  I saw
a 75% slowdown in DONLP2 compiling with -ffloat-store.  Rather than use
-ffloat-store everywhere, it should be enough to use it to compile just
the numerical libraries (but that might still be a big performance
impact).  An alternative is to compile just the code which automaticly
determines the machine constants with -ffloat-store, or even rewrite
those routines.  For example, the routine "dmach" (eg,
http://www.netlib.org/blas/dmach.f) looks like it uses the same approach
that failed for donlp2_f77, and therefore, would need to be compiled
with -ffloat-store, or be rewritten.

We could avoid --float-store, and instead, set the floating point
control word to avoid extended precision, as suggested by g77 info; but
that seems suboptimal and nonportable.  To fix DONLP2, we explicity
coded to avoid extended precision when computing epsmac and tolmac,
using a wrapper function ("double_identity") around the intemediate
results which, in effect, forced the compiler to discard the extra
(extended) precision.  Here's a little snippet of that reworked code:

      external double_identity
      double precision double_identity

      EPSMAC = TWO**(-20)
100   CONTINUE
      EPSMAC=EPSMAC/TWO
      TERM=double_identity(ONE+EPSMAC)
      IF ( TERM .NE. ONE ) GOTO 100
      EPSMAC=EPSMAC+EPSMAC
      TOLMAC=EPSMAC
200   CONTINUE
      TOL1=TOLMAC
      TOLMAC=double_identity(TOLMAC/TWOP4)
      IF ( TOLMAC .NE. ZERO ) GOTO 200
      TOLMAC=TOL1

Here are the double_identity routines.

C     Purpose: discard extra (ie, extended) precision to enable (donlp2)
C     computing epsmac properly without recourse to the -ffloat-store
C     hack which hurts performance.

C     We do this by forcing the value into array storage and passing the
C     array to a helper routine, since we don't want the optimizing
C     compiler to always be able to pass the value in a register with
C     extended precision.

C     To be very cautious (paranoid?), we could put double_identity
C     into a separate compilation unit to prevent (stronger) compile
C     time interprocedural optimization from optimizing out
C     double_identity_helper, and then double_identity.

      double precision function double_identity(asis_value)
      double precision asis_value
      double precision hide_value(1)
      double precision double_identity_helper
      external double_identity_helper

      hide_value(1) = asis_value
      double_identity = double_identity_helper(hide_value)
      return
      end

      double precision function double_identity_helper(hide_value)
      double precision hide_value(1)

      double_identity_helper = hide_value(1)
      return
      end

C []

Hope this helps.

/Jskud

>------ Begin Included Message ------
> From: "John W. Eaton" <address@hidden>
> Date: Thu, 11 Nov 2004 22:47:21 -0500
> To: "Dmitri A. Sergatskov" <address@hidden>
> Cc: address@hidden
> Subject: Re: gcc 3.4 and Octave/lapack problems
> X-CAE-MailScanner-Information: Please contact address@hidden if this message 
> contains a virus or has been corrupted in delivery.
> X-CAE-MailScanner: Found to be clean (hedwig)
> 
> On 11-Nov-2004, Dmitri A. Sergatskov <address@hidden> wrote:
> 
> | John W. Eaton wrote:
> | > On 11-Nov-2004, Dmitri A. Sergatskov <address@hidden> wrote:
> | > 
> | > | Also, if -ffloat-store indeed the must for lapack/octave, should we
> | > | make it a default?
> | > 
> | > It seems like this might be a reasonable change to make.  We'll need a
> | > configure check since -ffloat-store probably only makes sense for
> | > gcc/g++/g77.
> | 
> | I guess one of the questions weather we shall pass it to g77 only
> | (at the moment that looks sufficient), or to all three?
> | I noticed that loop performance drops some 20% if I have
> | it in CXXFLAGS. I do not see any difference if CFLAGS have
> | it or not.
> | 
> | Any insights?
> 
> If we are going to use -ffloat-store for Fortran code because it
> produces better results (or at least results that are more likely to
> agree with what we would expect from 64-bit IEEE floating point
> arithmetic) then it seems to me that we should use it for the C and
> C++ code as well.  Or maybe you would prefer to have bad results
> faster?  :-)
> 
> I've made changes to configure so that we check to see if the
> compilers accept -ffloat-store, but only on x86 platforms when using
> platforms when using the GNU compilers (individual checks are made for
> each).
> 
> jwe
> 
>------  End Included Message  ------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]