help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Machine epsilon, Octave and Matlab


From: Alex Verstak
Subject: Re: Machine epsilon, Octave and Matlab
Date: Thu, 30 Aug 2001 13:40:42 -0700 (PDT)

For gcc/x86, use -ffloat-store to have correct IEEE semantics.
Otherwise, as Thomas said, in-register and in-memory operations
produce different results because intel registers are several bits
wider than IEEE numbers.  This is just one of the reasons why
computational scientists hate intel and love (now discontinued)
alphas...

=alex


On Thu, 30 Aug 2001, Christoph Spiel wrote:

> On Wed, Aug 29, 2001 at 05:00:08PM -0500, Thomas Shores wrote:
> > Craig Stoudt wrote:
> > > Interesting.  I'm running Octave 2.1.31 under Win2000.
> > >  Running your script I get:
> > >
> > > x2 =  1.1102230246251565404e-16
> > > y2 = 0
> > > x4 =  5.5511151231257827021e-17
> > > y4 = 0
> > 
> > Running Octave 2.1.34 under RedHat Linux 7.0 I get the same results
> > as Craig.  I think the difference might be really a compiler issue.
> 
> In fact it is NOT a compiler issue, but the code a compiler generates
> can trigger the problem.
> 
> > --- snip ---
> 
> > Furthermore, the Pentium
> 
> Actually, the decision to use a wider internal data format was taken
> when the very first 8087 processors were designed.  In contrary to the
> 8087 and its successors most other current FPU designs use identical
> internal and external data formats.
> 
> > --- snip ---
> 
> > So what's the problem?  Well, *if* the compiler is smart enough to
> 
> It is not the compiler that evaluates the expression, but Octave,
> i.e., the interpreter.
> 
> > take a compound expression like the one above and store intermediate
> > results in the Pentium FPU registers, the answer should come out OK.
> 
> The interpreter tends to access (user) variables in memory, hence
>     octave:1> 1 + eps/2 + eps/2 == 1
>     ans = 1
> but calling a compiled function can hoist the same variables into
> the FPU registers and
>     sum([1, eps/2, eps/2]) == 1
> might return false (it does for some interpreters I know of) if the
> FPU uses larger internal precision.
> 
> > I'm guessing that for whatever reason this doesn't happen on the
> > Pentium, but does happen on the SGI compiler.  Perhaps its a
> > question of optimization on one compiler and not on the other.
> 
> The reasons are:
> (a) The Intel FPUs can use a wider internal data format.
> (b) The internal format can be controlled during runtime (of the
>     application).
> 
> The following program produces different results on a PIII Linux
> system depending on the FPU register width setting, the compiler
> optimization level, and the variable localization.
> 
> 
> #include <float.h>
> #include <fpu_control.h>
> #include <stdio.h>
> 
> static void
> fpu_double_mode(void)
> /* Switch FPU into "use double precision internally" mode, i.e.,
>  * disable 80bit FPU register width and thereby double-rounding. */
> {
>     unsigned cw = (_FPU_DEFAULT & ~_FPU_EXTENDED) | _FPU_DOUBLE;
>     _FPU_SETCW(cw);
> }
> 
> static void
> fpu_extended_mode(void)
> /* Enable full FPU register width. */
> {
>     unsigned cw = (_FPU_DEFAULT & ~_FPU_DOUBLE) | _FPU_EXTENDED;
>     _FPU_SETCW(cw);
> }
> 
> static void
> expr(void)
> {
>     VOLATILE double x2 = DBL_EPSILON / 2.0;
>     VOLATILE double y2 = 1.0 / (1.0 - x2) - 1.0 / (1.0 + x2);
>     VOLATILE double x4 = DBL_EPSILON / 4.0;
>     VOLATILE double y4 = 1.0 / (1.0 - x4) - 1.0 / (1.0 + x4);
> 
>     printf("x2 = %g\ny2 = %g\nx4 = %g\ny4 = %g\n\n", x2, y2, x4, y4);
> }
> 
> int
> main(void)
> {
>     fpu_double_mode();
>     expr();
>     fpu_extended_mode();
>     expr();
> 
>     return 0;
> }
> 
> 
> Here we go:
> 
> 
> $ gcc-3.0 -DVOLATILE='' -o fpu-regwidth fpu-regwidth.c
> $ ./fpu-regwidth
> x2 = 1.11022e-16
> y2 = 2.22045e-16
> x4 = 5.55112e-17
> y4 = 0
> 
> x2 = 1.11022e-16
> y2 = 2.22045e-16
> x4 = 5.55112e-17
> y4 = 1.11022e-16
> 
> 
> $ gcc-3.0 -O3 -DVOLATILE='' -o fpu-regwidth fpu-regwidth.c
> $ ./fpu-regwidth
> x2 = 1.11022e-16
> y2 = 0
> x4 = 5.55112e-17
> y4 = 0
> 
> x2 = 1.11022e-16
> y2 = 0
> x4 = 5.55112e-17
> y4 = 0
> 
> 
> $ gcc-3.0 -O3 -DVOLATILE='volatile' -o fpu-regwidth fpu-regwidth.c
> $ ./fpu-regwidth
> x2 = 1.11022e-16
> y2 = 2.22045e-16
> x4 = 5.55112e-17
> y4 = 0
> 
> x2 = 1.11022e-16
> y2 = 2.22045e-16
> x4 = 5.55112e-17
> y4 = 1.11022e-16
> 
> 
> Upshot: If you want portable results, control your environment.
> 
> 
> -cls
> 
> -- 
> Christoph L. Spiel  <address@hidden>
> Hammersmith Consulting,  web: www.hammersmith-consulting.com
> Phone: +49-8022-662908, fax: +49-8022-662909
> 
> 
> 
> -------------------------------------------------------------
> Octave is freely available under the terms of the GNU GPL.
> 
> Octave's home on the web:  http://www.octave.org
> How to fund new projects:  http://www.octave.org/funding.html
> Subscription information:  http://www.octave.org/archive.html
> -------------------------------------------------------------
> 
> 



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]