replicating simulations on x86 computers

swarm-support
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
replicating simulations on x86 computers

From:	Theodore C. Belding
Subject:	replicating simulations on x86 computers
Date:	Thu, 3 Dec 1998 09:23:31 -0500 (EST)
Since replication of results is a major goal of Swarm, I thought I'd pass
on the following.  I don't think it's been discussed before on this list,
but maybe I missed it.  Maybe it should be added to the Swarm FAQ.

About a year ago, I found to my surprise that I was getting different
results from my C++ genetic algorithm program on x86 Linux, compared to
runs on HP Unix workstations. This even though I was using the same
program, compiler, and random seeds.

It turns out that the x86 (actually the x87 fpu) by default uses extended
precision, 80-bit floating point numbers in its calculations. (I think the
Motorola 68000 does also).  Modern RISC architectures typically use IEEE
standard doubles, which are 64 bits.  The extra precision on x86 was
leading to gradual deviations in my simulation results, compared to
results on the HPs. I was assuming that all modern computers follow the
IEEE floating-point standard, and that this implied that my results would
be consistent between computing platforms.  Neither of these assumptions
is valid, in general. 

This problem may occur on *any* program compiled on x86, under any OS, in
any language. (I haven't seen any problems in Mathematica programs, which
run at a much higher level than compiled C++ code). 

Now, if you don't care about replicating results across platforms, this of
course isn't a problem.  But I would like to be able to replicate results
from x86 on HPs and vice versa.

One way of forcing the x86 to emulate IEEE double precision is to use the
gcc compiler's -ffloat-store flag. The gcc manual says:

"`-ffloat-store'
     Do not store floating point variables in registers, and inhibit
     other options that might change whether a floating point value is
     taken from a register or memory.

     This option prevents undesirable excess precision on machines such
     as the 68000 where the floating registers (of the 68881) keep more
     precision than a `double' is supposed to have.  Similarly for the
     x86 architecture.  For most programs, the excess precision does
     only good, but a few programs rely on the precise definition of
     IEEE floating point.  Use `-ffloat-store' for such programs."

In practice, this works for me: I get exactly the same results on x86 as I
do on other platforms, if I compile with -ffloat-store on x86. In theory,
it should cause a 2 to 4-fold performance hit, but my program runs less
than .3% slower with -ffloat-store than without. (I assume, though, that
this is due to the fact that my program is not well-optimized for x86
floating-point performance in general, or that it doesn't use enough
floating-point math to cause a performance hit.)

NOTE: As I understand it, -ffloat-store does *not* cause the x86 to
produce results that are completely compliant with IEEE the floating-point
standard.  There seem to be some potential problems with double rounding
that may cause errors of magnitude 10^-324. (See
http://math.nist.gov/javanumerics/jgfnwg-01.html for the gory details.) 
Also, there may be discrepancies with floating-point exceptions and
rounding.  I don't claim to understand this completely, though.  Again,
-ffloat-store seems to work for me in practice, but I wouldn't rely on it.

One might think that the obvious way to get around the performance hit of
-ffloat-store would be to simply put the x86 fpu into double precision
mode, instead of the default extended mode. From the development version
of the g77 manual:

"Floating point precision
------------------------

   If your program depends on exact IEEE 754 floating point handling it
may help on some systems--specifically x86 or m68k hardware--to use the
`-ffloat-store' option or to reset the precision flag on the floating
point unit *Note Optimize Options::.

   However, it might be better simply to put the FPU into double
precision mode and not take the performance hit of `-ffloat-store'.  On
x86 and m68k GNU systems you can do this with a technique similar to
that for turning on floating point exceptions *Note Floating-point
Exception Handling::.  The control word could be set to double
precision by replacing the `__setfpucw' call with one like this:
       __setfpucw ((_FPU_DEFAULT & ~_FPU_EXTENDED) | _FPU_DOUBLE);
   (It is not clear whether this has any effect on the operation of the
GNU maths library, but we have no evidence of it causing trouble.)"

For example you can compile and link the following C code with gcc on x86
Linux, to put the fpu in double precision mode (the function will be
called automatically when the program starts): 

     #include <fpu_control.h>
     void __attribute__ ((constructor))
     enter_fpu_double_mode () {
       (void) __setfpucw ((_FPU_DEFAULT & ~_FPU_EXTENDED) |
                _FPU_DOUBLE);
     }

But it turns out that even in double mode, the x86 still uses 15 bits of
exponent internally, instead of the 11 bits required by IEEE doubles, so
this doesn't solve the problem. (Again, see
http://math.nist.gov/javanumerics/jgfnwg-01.html). 

If you want more details on all of this, I initiated a discussion of this
on the egcs mailing list; look for recent messages with "-ffloat-store" 
in the subject:  http://www.cygnus.com/ml/egcs/

For more information on floating-point computer arithmetic, see:
Goldberg, David. (1996). Computer Arithmetic. Appendix A in Hennessy, John
L., and David A. Patterson, Computer Architecture:  A Quantitative
Approach. 2nd ed. Morgan Kaufmann. 

For more information on the x87 fpu, see the Intel Architecture
documentation:
http://developer.intel.com/design/litcentr/indextb.htm
(See also Appendix D in Hennessy and Patterson.)

Finally, I should say that we shouldn't rely on being able to get the same
results across different platforms, even if things appear to work in
practice.  I think the only solution is to record the exact computer
platform, compiler, and OS that was used, and include this information
with the simulation results.  Also, I don't mean to imply that only x86
has problems in this area (though it certainly is an outstanding example).

Differences in floating-point results can also be introduced by
differences in the implementation of standard math functions, such as the
standard C function log(). More generally, there may also be
cross-platform differences in *integer* arithmetic, especially between 32-bit
platforms and 64-bit platforms. For example, programs should not make any
assumptions about the size of int or float in C/C++, except to the extent
that they are defined by the C or C++ standards (see especially float.h
and limits.h).

In summary, there is no reason to panic, but I think these are all issues
that we should be aware of. Disclaimer: I am not an expert on any of this.
-Ted

--
Ted Belding                               address@hidden 
University of Michigan Program for the Study of Complex Systems
Homepage: http://www-personal.umich.edu/~streak/
PGP key:  http://www-personal.umich.edu/~streak/pgp-key.html



                  ==================================
   Swarm-Support is for discussion of the technical details of the day
   to day usage of Swarm.  For list administration needs (esp.
   [un]subscribing), please send a message to <address@hidden>
   with "help" in the body of the message.
[Prev in Thread]
Current Thread
[Next in Thread]
replicating simulations on x86 computers, Theodore C. Belding <=
Prev by Date: Re: Something funny about Redhat 5.2's tcl/tk. Hold off upgrading if possible
Next by Date: Argument passing to swarm (equal sign puzzle)
Previous by thread: Something funny about Redhat 5.2's tcl/tk. Hold off upgrading if possible
Next by thread: Argument passing to swarm (equal sign puzzle)
Index(es):
- Date
- Thread