Re: floating point precision control

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: floating point precision control

From:	siko1056
Subject:	Re: floating point precision control
Date:	Thu, 14 Jul 2016 07:03:04 -0700 (PDT)

Mike Miller wrote
> [...] should we just enforce double extended (long double) precision on
> all x86 systems all the time [...]

I agree with Michael Godfrey and jwe. Looking at Matlab, a "double" is
clearly specified as IEEE-754 binary64 and "single" as binary32 [1]. Octave
does not explicitly refer to IEEE-754 (maybe we should do so!), but speaks
of "double precision" and not "extended precision" (80 bits). Here we would
break the documentation entirely. A precise documentation and clear examples
when the (floating-point, integer, or mixed)-arithmetic will fail are to
favor. And looking at all bugs, only Windows seems affected.

Regarding bug #40607 (mxe-octave: mixed uint64/double arithmetic incorrect):
Like Rik said, the interpreter is to blame for choosing the wrong operator!
We clearly documented in [3], that any operation with integers will return
an integer result. As a user, I expect the results from Example 1 and 2:

Example 1:
>> dMax = uint64 (flintmax ('double'));
>> a = dMax + 1
>> b = uint64 (double (dMax) + 1)

a = 9007199254740993
b = 9007199254740992

Example 2:
>> a = 2^63
>> b = 2^64 - 2^63
>> c = uint64 (2^64) - 2^63
>> d = intmax ('uint64') - 2^63

a = 9223372036854775808
b = 9223372036854775808
c = 9223372036854775807
d = 9223372036854775807

I think the problem is and was, that Octave uses "long double" for mixed
computations, as introduced 2008 by Jaroslav Hajek [4]. "long double" is not
really standardized [5] and requires checking for 10 or 16 bytes to produce
reliable results. Wasn't it more appropriate to cast both parameters to
standardized (u)int64_t, ensuring our slightly different integer arithmetic
by C++?

Another burden I see for the compiler is, that Octave might avoid the
compiler to use upcoming instructions-sets like AVX [6], that only regard
packing binary64, and do all our library dependencies (LAPACK, SuiteSparse,
...) support our desire for extended-precision, too?

Last, but not least like jwe explained, changing the FP control word works
only for system library calls at runtime, that can use extended registers.
It does not magically extend all doubles to long doubles (which must happen
at compile time) in our C++ code, making Octave an higher precision tool.
The FP control word has an effect on intermediate computations within the
registers, this seems not to work in Windows.

To sum up: Please stay with IEEE-754 binary64 and binary32. Fix the Windows
issues regarding the FP control word with some #ifdef __WIN32__ and a
comment to drop all the code, if the floating-point issues are fixed in the
future and leave the FP control word as it is, but allow the user to set it
manually from command line.

Another test case I found to detect intermediate double rounding, using 80
bit registers (for me Octave uses pure binary64 for this one). If no
intermediate rounding happens, both printed values match, otherwise
(3.6893488147419103232e+19) is returned:

Example 3 [7]:
a = 1848874847.0;
b = 19954562207.0;
fprintf ("%20.19e\n3.6893488147419111424e+19\n", a * b)

Best,
Kai

[1]
https://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html
[2]
https://www.gnu.org/software/octave/doc/interpreter/Numeric-Data-Types.html#Numeric-Data-Types
[3]
https://www.gnu.org/software/octave/doc/interpreter/Promotion-and-Demotion-of-Data-Types.html#Promotion-and-Demotion-of-Data-Types
[4] http://hg.savannah.gnu.org/hgweb/octave/rev/69c5cce38c29
[5] https://en.wikipedia.org/wiki/Long_double#Implementations
[6] https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
[7] dx.doi.org/10.1007/978-0-8176-4705-6 [Program 3.1, originally written in
C]





--
View this message in context: 
http://octave.1599824.n4.nabble.com/floating-point-precision-control-tp4678363p4678486.html
Sent from the Octave - Maintainers mailing list archive at Nabble.com.

[Prev in Thread]

Current Thread

[Next in Thread]

floating point precision control, Mike Miller, 2016/07/10
- Re: floating point precision control, John W. Eaton, 2016/07/12
  - Re: floating point precision control, Michael Godfrey, 2016/07/13
    - Re: floating point precision control, siko1056 <=
    - Re: floating point precision control, Mike Miller, 2016/07/14
    - Re: floating point precision control, John W. Eaton, 2016/07/14
    - Re: floating point precision control, siko1056, 2016/07/15
    - Re: floating point precision control, John W. Eaton, 2016/07/15

Prev by Date: Re: Yet another GUI Builder for Octave
Next by Date: Re: Weird NEWS on the OF website
Previous by thread: Re: floating point precision control
Next by thread: Re: floating point precision control
Index(es):
- Date
- Thread