[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gcc 3.4 and Octave/lapack problems
From: |
Jskud |
Subject: |
Re: gcc 3.4 and Octave/lapack problems |
Date: |
Thu, 11 Nov 2004 21:24:51 -0800 |
Reading the gcc documentation, one finds that the -ffloat-store is an
expensive hack to attempt to avoid the excess precision in the floating
pointer registers on Intel FPUs. Based on recent experience with
DONLP2, a noteworthy nonlinear solver, using -ffloat-store everywhere is
an unnecessarily costly workaround.
The problem we saw in DONLP2, and I suspect in the numerical subroutines
used by Octave, is that they automatically calculate the machine
floating pointer characteristics. When floating pointer numbers are
left in Intel FPU registers during those calculations, they have extra
precision (80 bits, I think), instead of the 64 bits that doubles have
in memory. Therefore, the calculated values of vital constants like
"machine epsilon" are wrong.
The problem is that -ffloat-store can really slow things down. Dmitri
observed a 20% slowdown in Octave using -ffloat-store everywhere. I saw
a 75% slowdown in DONLP2 compiling with -ffloat-store. Rather than use
-ffloat-store everywhere, it should be enough to use it to compile just
the numerical libraries (but that might still be a big performance
impact). An alternative is to compile just the code which automaticly
determines the machine constants with -ffloat-store, or even rewrite
those routines. For example, the routine "dmach" (eg,
http://www.netlib.org/blas/dmach.f) looks like it uses the same approach
that failed for donlp2_f77, and therefore, would need to be compiled
with -ffloat-store, or be rewritten.
We could avoid --float-store, and instead, set the floating point
control word to avoid extended precision, as suggested by g77 info; but
that seems suboptimal and nonportable. To fix DONLP2, we explicity
coded to avoid extended precision when computing epsmac and tolmac,
using a wrapper function ("double_identity") around the intemediate
results which, in effect, forced the compiler to discard the extra
(extended) precision. Here's a little snippet of that reworked code:
external double_identity
double precision double_identity
EPSMAC = TWO**(-20)
100 CONTINUE
EPSMAC=EPSMAC/TWO
TERM=double_identity(ONE+EPSMAC)
IF ( TERM .NE. ONE ) GOTO 100
EPSMAC=EPSMAC+EPSMAC
TOLMAC=EPSMAC
200 CONTINUE
TOL1=TOLMAC
TOLMAC=double_identity(TOLMAC/TWOP4)
IF ( TOLMAC .NE. ZERO ) GOTO 200
TOLMAC=TOL1
Here are the double_identity routines.
C Purpose: discard extra (ie, extended) precision to enable (donlp2)
C computing epsmac properly without recourse to the -ffloat-store
C hack which hurts performance.
C We do this by forcing the value into array storage and passing the
C array to a helper routine, since we don't want the optimizing
C compiler to always be able to pass the value in a register with
C extended precision.
C To be very cautious (paranoid?), we could put double_identity
C into a separate compilation unit to prevent (stronger) compile
C time interprocedural optimization from optimizing out
C double_identity_helper, and then double_identity.
double precision function double_identity(asis_value)
double precision asis_value
double precision hide_value(1)
double precision double_identity_helper
external double_identity_helper
hide_value(1) = asis_value
double_identity = double_identity_helper(hide_value)
return
end
double precision function double_identity_helper(hide_value)
double precision hide_value(1)
double_identity_helper = hide_value(1)
return
end
C []
Hope this helps.
/Jskud
>------ Begin Included Message ------
> From: "John W. Eaton" <address@hidden>
> Date: Thu, 11 Nov 2004 22:47:21 -0500
> To: "Dmitri A. Sergatskov" <address@hidden>
> Cc: address@hidden
> Subject: Re: gcc 3.4 and Octave/lapack problems
> X-CAE-MailScanner-Information: Please contact address@hidden if this message
> contains a virus or has been corrupted in delivery.
> X-CAE-MailScanner: Found to be clean (hedwig)
>
> On 11-Nov-2004, Dmitri A. Sergatskov <address@hidden> wrote:
>
> | John W. Eaton wrote:
> | > On 11-Nov-2004, Dmitri A. Sergatskov <address@hidden> wrote:
> | >
> | > | Also, if -ffloat-store indeed the must for lapack/octave, should we
> | > | make it a default?
> | >
> | > It seems like this might be a reasonable change to make. We'll need a
> | > configure check since -ffloat-store probably only makes sense for
> | > gcc/g++/g77.
> |
> | I guess one of the questions weather we shall pass it to g77 only
> | (at the moment that looks sufficient), or to all three?
> | I noticed that loop performance drops some 20% if I have
> | it in CXXFLAGS. I do not see any difference if CFLAGS have
> | it or not.
> |
> | Any insights?
>
> If we are going to use -ffloat-store for Fortran code because it
> produces better results (or at least results that are more likely to
> agree with what we would expect from 64-bit IEEE floating point
> arithmetic) then it seems to me that we should use it for the C and
> C++ code as well. Or maybe you would prefer to have bad results
> faster? :-)
>
> I've made changes to configure so that we check to see if the
> compilers accept -ffloat-store, but only on x86 platforms when using
> platforms when using the GNU compilers (individual checks are made for
> each).
>
> jwe
>
>------ End Included Message ------
- gcc 3.4 and Octave/lapack problems, Dmitri A. Sergatskov, 2004/11/11
- gcc 3.4 and Octave/lapack problems, John W. Eaton, 2004/11/11
- Re: gcc 3.4 and Octave/lapack problems, Dmitri A. Sergatskov, 2004/11/11
- Re: gcc 3.4 and Octave/lapack problems, John W. Eaton, 2004/11/11
- Re: gcc 3.4 and Octave/lapack problems, Dmitri A. Sergatskov, 2004/11/11
- Re: gcc 3.4 and Octave/lapack problems, Dmitri A. Sergatskov, 2004/11/11
- Re: gcc 3.4 and Octave/lapack problems, John W. Eaton, 2004/11/11
- Re: gcc 3.4 and Octave/lapack problems, Dmitri A. Sergatskov, 2004/11/12
- Re: gcc 3.4 and Octave/lapack problems,
Jskud <=
- Re: gcc 3.4 and Octave/lapack problems, Dmitri A. Sergatskov, 2004/11/12
- Re: gcc 3.4 and Octave/lapack problems, Quentin Spencer, 2004/11/12
- Re: gcc 3.4 and Octave/lapack problems, Dmitri A. Sergatskov, 2004/11/12