[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Comments on gm2 floating-point output
From: |
Fischlin Andreas |
Subject: |
Re: Comments on gm2 floating-point output |
Date: |
Fri, 10 May 2024 16:47:19 +0000 |
Hi Nelson,
Thanks for those excellent comments and observations. Indeed a very important
topic. It is a pity in todays CS and IT environment, that solid scientific
insights, which have e.g. resulted in the quite excellent IEEE 754 standard for
floating point arithmetics got completely forgotten these days. This goes
unfortunately so far, that even the hardware is basically all wrong and FPU
registers that use internally the needed extra bits to warrant full
reproducibility are all gone. I have to admit, it is not clear to me anymore,
how to best behave in this situation. Just emulate 32-bit and 64-bit IEEE
floating point arithmetics with 128 bits? And writing all the code ourselves to
implement the IEEE 754 floating point standard (including exception handling
etc.)? Quite a challenging task, that should be taken care of by the hardware,
not us “numerical programmers”.
Concerning gm2 it is not clear to me what this all means. Perhaps someone else
could provide some answers as I do not know what the approach is that was
chosen for gm2 when it comes to implement the IEEE floating point standard. Is
gm2 mostly relying on existing C libraries, hoping they work correctly?
Andreas
ETH Zurich, Prof. em. Dr. Andreas Fischlin, IPCC Vice-Chair WGII, Systems
Ecology, CHN E 24, Universitaetstrasse 16, CH-8092 Zurich, SWITZERLAND
Tel: +41 44 633-6090 Mobile: +41 79 595-4050 Mail:
andreas.fischlin@env.ethz.ch
www.sysecol.ethz.ch/people/andreas.fischlin.html
> On 10 May 2024, at 01:06, Nelson H. F. Beebe <beebe@math.utah.edu> wrote:
>
> I am writing to comment on what I view as erroneous and/or misleading
> behavior of gm2's implementations of WriteXXX() functions for
> SHORTREAL, REAL, and LONGREAL datatypes. To keep this note shorter, I
> have placed test Modula-2 programs, and their output produced on a
> GNU/Linux x86_64 system, at this site:
>
> http://www.math.utah.edu/~beebe/modula-2/
>
> The gm2 version used at that site is from a build of the first gcc-15
> snapshot that appeared on 24-Apr-2024. Since then, gcc-14.1.0 has
> been officially released, the first such since August of last year.
>
> I do not have access to the final version of ISO/IEC 10514-3:1998, the
> international standard for Modula-2. Instead, I have an earlier June
> 1994 draft from
>
> ftp://ftp.psg.com/pub/modula-2/wg13.dis.tar
>
> However, any differences between the draft and final specifications
> are unlikely to be relevant for my points. I refer in particular to
> the draft specification of WriteFloat on sequential page 532 (numbered
> page 445, section 9.2.2.3.3) where it says:
>
>>> ...
>>> The call WriteFloat(cid, real, sigFigs, width) shall write the value of
>>> real to the output stream identified by cid in floating-point text
>>> form, ... if the value of sigFigs is greater than zero, that number of
>>> significant digits shall be included, otherwise an
>>> implementation-defined number of significant digits shall be included.
>>> ...
>
> Two pages later, the specification of WriteReal() refers back to that
> for WriteFloat() with the text
>
>>> ...
>>> ... the call WriteReal(cid, real, width) shall behave as the call
>>> WriteFixed(cid, real, place, width), with a value of place chosen
>>> to fill exactly the remaining field. Otherwise the call shall behave
>>> as the call WriteFloat(cid, real, sigFigs, width), with a value of
>>> sigFigs of at least one, ...
>>> ...
>
> Here are the issues that I noted:
>
> -----------------------------------------------------------------------
>
> (1) The second nonblank output line for the three floating-point
> formats looks like these:
>
> eps64 = 1.0842021724855044E-19 = = 2**(-63)
> eps64 = 2.2204460492503131E-16 = = 2**(-52)
> eps32 = 1.19209290E-7 = 0.000000 = = 2**(-23)
>
> In each case, the empty value between the two final = signs comes from
> calls like this one in the REAL case:
>
> WriteReal (StdChans.StdOutChan(), eps, 0)
>
> It seems MUCH more useful to instead display a suitable default number
> of digits when sigFigs = 0, such that round trip conversion from
> binary to decimal to binary recovers the original binary value inside
> the computer. This topic was studied by David Matula and Bennett
> Goldberg in the 1960s, in papers that you can find here:
>
> https://www.math.utah.edu/pub/tex/bib/cacm1960.bib
> https://www.math.utah.edu/pub/tex/bib/cacm1960.html
>
> [they look almost identical on the screen, except that the first is
> for BibTeX, and the second is for browsers, and has live hyperlinks].
>
> ACM has made journal archives from their first 50 years available for
> free download, so anyone should be able to get the cited papers via
> the DOI values in their BibTeX entries.
>
> Their findings are different from what most people then thought, and
> clearly continue to think. In particular, these two hoc functions
> return the required digit counts:
>
> ### matula(nbits): return number of decimal digits needed to ensure
> ### correct round-trip conversion between binary and decimal
> func matula(nbits) { return ceil(nbits/log2(10) + 1) }
>
> ### goldberg(ndecdig): return number of bits needed to ensure correct
> ### round-trip conversion between binary and ndecdig decimal digits
> func goldberg(ndecdig) { return ceil(ndecdig/log10(2) + 1) }
>
> For the four IEEE 754 binary formats, here are the Matula counts:
>
> --------------------------------
> format decimal digits
> --------------------------------
> binary32 9
> binary64 17
> binary80 21
> binary128 36
> --------------------------------
>
> On x86_64 systems then, gm2 should use replacement sigFigs values of
> 9, 17, and 21 for the SHORTREAL, REAL, and LONGREAL types. Those
> values can also be caps on the counts produced when larger sigFigs
> values are requested by user code.
>
> gm2's Write*Real() functions produce 17, 37, and 45 digits for those
> formats, after which, trailing zeros are supplied.
>
> I'm working on getting access to a working gm2 for systems that use
> the IEEE 754 binary128 format for LONGREAL, but cannot yet report
> results for those platforms.
>
> ------------------------------------------------------------------------
>
> (2) The output line for eps32 above shows a result of 0.000000 from
> WriteShortReal (eps, 0); that would be more sensibly displayed in
> exponential form to show that it is nonzero.
>
> ------------------------------------------------------------------------
>
> (3) When sigFigs = 0, all three test programs produce rubbish output
> from WriteFloat():
>
> sigFigs = 0 eps32 =
> 11920928955078125000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> sigFigs = 0 eps64 =
> 22204460492503130808472633361816406250000000000000000000000000000000000000000000000000000000000000000
> sigFigs = 0 eps80 =
> 10842021724855044340074528008699417114257812500000000000000000000000000000000000000000000000000000000
>
> Once again, the Matula defaults would be far more sensible here.
>
> ------------------------------------------------------------------------
>
> It is not only gm2 that has such issues. Below is a snippet (in LaTeX)
> from a book that I'm writing about a new math library that I'm
> developing. It describes how widespread is the ignorance of the
> Matula and Goldberg formulas published more then 55 years ago in
> possibly the most widely read journal in the field of computer
> science:
>
>>> ...
>>> Regrettably, output conversions without precision specifications in
>>> both \X{Fortran} and \X{C} default to 6 fraction digits for
>>> floating-point output, whereas Matula mandates 9 for the 32-bit IEEE
>>> format, 10 for older 36-bit formats with 27-bit fractions, and 17 for
>>> the 64-bit IEEE format. Such defaults are likely to be relied on by
>>> many users, who do not understand that subsequent input of those
>>> truncated values will \emph{not} reproduce the numbers that were in
>>> the computer, because they lose 3 or more trailing bits.
>>>
>>> More recent programming languages are little better. For the IEEE 754
>>> 32-bit and 64-bit binary formats, without specifying an output
>>> precision, \X{COBOL} produces 8 and 16 decimal digits, \X{C\#} outputs
>>> 7 and 15, and \X{Go}, \X{Java}, and \X{Rust} output 8 and 17.
>>> Some languages only provide the 64-bit format: the \X{Awk}, \X{PHP},
>>> and \X{Python} scripting languages all output 6 digits,
>>> \X{Algol 68 Genie}, \X{LibreOffice}, and \X{Perl} output 15 digits,
>>> \X{Matlab} and \X{Octave} output 5 and 16 in \code{short} and \code{long}
>>> formats,
>>> \X{GNU} \X{Pascal} outputs 16 digits, and
>>> \X{JavaScript} and \X{MetaPost} produce the Matula-mandated 17 digits.
>>> Given the age of some programming languages,%
>>> \footnote{\X{Fortran} appeared first in 1956, \X{COBOL} in 1959,
>>> \X{Algol 68-R} and \X{Pascal} in 1970, \X{C} in 1972,
>>> and \X{C++} in 1979.}
>>> and given that several of them are defined by international standards
>>> written by presumed experts in the languages and their practical
>>> implementation, we have considerable evidence that correct round-trip
>>> base conversion has been generally ignored, and the chosen defaults
>>> are simply \emph{wrong}. This author's \pgm{hoc} implementations,
>>> however, all follow the Goldberg%
>>> \index{Goldberg, Bennett}
>>> and Matula%
>>> \index{Matula, David [American mathematician and computer scientist]}%
>>> \index{United States}
>>> formulas.
>>> ...
>
> -------------------------------------------------------------------------------
> - Nelson H. F. Beebe Tel: +1 801 581 5254
> -
> - University of Utah
> -
> - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu
> -
> - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org
> -
> - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/
> -
> -------------------------------------------------------------------------------
>