octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: floating point precision control


From: Mike Miller
Subject: Re: floating point precision control
Date: Thu, 14 Jul 2016 14:11:24 -0700
User-agent: Mutt/1.6.0 (2016-04-01)

On Thu, Jul 14, 2016 at 07:03:04 -0700, siko1056 wrote:
> Mike Miller wrote
> > [...] should we just enforce double extended (long double) precision on
> > all x86 systems all the time [...]
> 
> I agree with Michael Godfrey and jwe. Looking at Matlab, a "double" is
> clearly specified as IEEE-754 binary64 and "single" as binary32 [1]. Octave
> does not explicitly refer to IEEE-754 (maybe we should do so!), but speaks
> of "double precision" and not "extended precision" (80 bits). Here we would
> break the documentation entirely. A precise documentation and clear examples
> when the (floating-point, integer, or mixed)-arithmetic will fail are to
> favor. And looking at all bugs, only Windows seems affected.

Thank you all for your comments, replying to Kai for simplicity.

Thanks for the pointer to the very relevant LKML thread, jwe.

Yes, the FP control word is specific to x86 processors, and yes the
"double extended" or "long double" type (let's call it "binary80") is an
optional extension. Agree with the sentiment here.

All bugs are not related to Windows though. As I mentioned, the first
time I was aware of this was on 32-bit GNU/Linux.

Specifically, with Octave on x86-32:

  octave:1> 1e-5 == 10^-5
  ans = 1
  octave:2> javaObject ("java.lang.String");
  octave:3> 1e-5 == 10^-5
  ans = 0

The first result where they are equal is with the machine FP control
word set to use binary80 precision. The second result is with the FP
control word set to use binary64 precision. Again, on GNU/Linux,
binary80 is the default internal precision for any program compiled with
GCC (not using SSE or AVX registers), and x86-32 Java changes it
internally to be binary64 regardless of what the system or compiler or
user intended.

> Regarding bug #40607 (mxe-octave: mixed uint64/double arithmetic incorrect):
> Like Rik said, the interpreter is to blame for choosing the wrong operator!
> We clearly documented in [3], that any operation with integers will return
> an integer result. As a user, I expect the results from Example 1 and 2:
> 
> Example 1:
> >> dMax = uint64 (flintmax ('double'));
> >> a = dMax + 1
> >> b = uint64 (double (dMax) + 1)
> 
> a = 9007199254740993
> b = 9007199254740992
> 
> Example 2:
> >> a = 2^63
> >> b = 2^64 - 2^63
> >> c = uint64 (2^64) - 2^63
> >> d = intmax ('uint64') - 2^63
> 
> a = 9223372036854775808
> b = 9223372036854775808
> c = 9223372036854775807
> d = 9223372036854775807
> 
> I think the problem is and was, that Octave uses "long double" for mixed
> computations, as introduced 2008 by Jaroslav Hajek [4]. "long double" is not
> really standardized [5] and requires checking for 10 or 16 bytes to produce
> reliable results. Wasn't it more appropriate to cast both parameters to
> standardized (u)int64_t, ensuring our slightly different integer arithmetic
> by C++?

Thanks for pointing to the changeset that introduced those routines.
Agree this could be viewed as a workaround and maybe there is a better
way to handle these 64-bit integer calculations.

> Another burden I see for the compiler is, that Octave might avoid the
> compiler to use upcoming instructions-sets like AVX [6], that only regard
> packing binary64, and do all our library dependencies (LAPACK, SuiteSparse,
> ...) support our desire for extended-precision, too?
> 
> Last, but not least like jwe explained, changing the FP control word works
> only for system library calls at runtime, that can use extended registers.
> It does not magically extend all doubles to long doubles (which must happen
> at compile time) in our C++ code, making Octave an higher precision tool.
> The FP control word has an effect on intermediate computations within the
> registers, this seems not to work in Windows.

I'm not sure why you say it seems not to work on Windows, it actually
does seem to have an effect, at least for some operations, on x86-32
Windows systems, as well as on x86-32 GNU/Linux systems (see #48418).

On x86-64 Windows systems, you are correct, it does not seem to have an
effect on certain tests that are failing there.

> To sum up: Please stay with IEEE-754 binary64 and binary32. Fix the Windows
> issues regarding the FP control word with some #ifdef __WIN32__ and a
> comment to drop all the code, if the floating-point issues are fixed in the
> future and leave the FP control word as it is, but allow the user to set it
> manually from command line.

Here you are referring to the 64-bit integer computations in
oct-inttypes (#40607), correct?

I agree with that, I'm more concerned about the following:

When Octave starts, should we inherit whatever the compiler / runtime /
user have chosen as the default FP control state? Probably, and I should
probably backout a5a99a830c8c.

Should we stop trying to recover from Java changing the value of the
control register in the middle of an Octave session, thereby changing
the results of calculations and test results as shown above? Maybe, or
maybe do it in a smarter way than it is done now.

Should we increase tolerances on tests that seem to have been written to
ensure that certain tolerances were actually met, for example the tests
in assert.m under "variable tolerance":

  ## this passes with default binary80 prec, fails with binary64 prec
  assert ((10^-30) * (10^-30), 10^(-30 * 2), eps (10^-60))

Or should we leave tolerances alone and encourage users to build Octave
with SSE instructions on x86-32 systems (where it is not the default)?
And should we try SSE for mxe-octave cross-building as well?

> Another test case I found to detect intermediate double rounding, using 80
> bit registers (for me Octave uses pure binary64 for this one). If no
> intermediate rounding happens, both printed values match, otherwise
> (3.6893488147419103232e+19) is returned:
> 
> Example 3 [7]:
> a = 1848874847.0;
> b = 19954562207.0;
> fprintf ("%20.19e\n3.6893488147419111424e+19\n", a * b)

The values are different in Octave built for x86-32 Debian for me
(without the -mfpmath=sse option), either with or without the
octave_set_default_fpucw() function stubbed out completely. The only way
I get matching values is if I explicitly set the FPU precision control
to 2 (binary64 rounding).

Thanks for the discussion,

-- 
mike



reply via email to

[Prev in Thread] Current Thread [Next in Thread]