[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Comments on gm2 floating-point output

From: Michael Riedl
Subject: Re: Comments on gm2 floating-point output
Date: Fri, 10 May 2024 17:05:51 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Thunderbird/102.11.0


very good comment - and highly supported when we ensure that a later read of such a number exactly reproduces the input. That then would really make sense as a default "implementation-defined number" for the output routines.

By the way that is how it is handled by GNU and AMD Fortran compilers (here the output form GNU Fortran) - test routine attached. Intel Fortran is not that smart, but the re-read values look OK (output attached).

 Machine epsilon for different real kinds

                           1 2         3
 qp          16   1.92592994438723585305597794258492732E-0034
 ep          10   1.08420217248550443401E-0019
 dp           8   2.2204460492503131E-016
 sp           4   1.19209290E-07

 Output after write/read

 qp          16   1.92592994438723585305597794258492732E-0034
 ep          10   1.08420217248550443401E-0019
 dp           8   2.2204460492503131E-016
 sp           4   1.19209290E-07

On the other hand I have also my fiddles with the real output routines as it is hard to get something written like the following table of eigenvalues and vectors in a systematic and reliable way which is not immediately spoiled if one value in the headline is e.g. greater/equal 10.

          1         2         3         4

       -2.000000 -0.000000  0.000000  2.000000

   1    0.500000 -0.707104  0.002132  0.500000
   2   -0.500000 -0.002132 -0.707104  0.500000
   3    0.500000  0.707104 -0.002132  0.500000
   4   -0.500000  0.002132  0.707104  0.500000

But that is more with the definition of the output routines which where not made with scientific or engineering applications in mind :-(



Am 10.05.24 um 01:04 schrieb Nelson H. F. Beebe:
I am writing to comment on what I view as erroneous and/or misleading
behavior of gm2's implementations of WriteXXX() functions for
SHORTREAL, REAL, and LONGREAL datatypes.  To keep this note shorter, I
have placed test Modula-2 programs, and their output produced on a
GNU/Linux x86_64 system, at this site:

The gm2 version used at that site is from a build of the first gcc-15
snapshot that appeared on 24-Apr-2024.  Since then, gcc-14.1.0 has
been officially released, the first such since August of last year.

I do not have access to the final version of ISO/IEC 10514-3:1998, the
international standard for Modula-2.  Instead, I have an earlier June
1994 draft from

However, any differences between the draft and final specifications
are unlikely to be relevant for my points.  I refer in particular to
the draft specification of WriteFloat on sequential page 532 (numbered
page 445, section where it says:

The call WriteFloat(cid, real, sigFigs, width) shall write the value of
real to the output stream identified by cid in floating-point text
form, ... if the value of sigFigs is greater than zero, that number of
significant digits shall be included, otherwise an
implementation-defined number of significant digits shall be included.
Two pages later, the specification of WriteReal() refers back to that
for WriteFloat() with the text

  ... the call WriteReal(cid, real, width) shall behave as the call
WriteFixed(cid, real, place, width), with a value of place chosen
to fill exactly the remaining field. Otherwise the call shall behave
as the call WriteFloat(cid, real, sigFigs, width), with a value of
sigFigs of at least one, ...
Here are the issues that I noted:


(1) The second nonblank output line for the three floating-point
formats looks like these:

        eps64 = 1.0842021724855044E-19 =  = 2**(-63)
        eps64 = 2.2204460492503131E-16 =  = 2**(-52)
        eps32 = 1.19209290E-7 = 0.000000 =  = 2**(-23)

In each case, the empty value between the two final = signs comes from
calls like this one in the REAL case:

        WriteReal (StdChans.StdOutChan(), eps, 0)

It seems MUCH more useful to instead display a suitable default number
of digits when sigFigs = 0, such that round trip conversion from
binary to decimal to binary recovers the original binary value inside
the computer.  This topic was studied by David Matula and Bennett
Goldberg in the 1960s, in papers that you can find here:

[they look almost identical on the screen, except that the first is
for BibTeX, and the second is for browsers, and has live hyperlinks].

ACM has made journal archives from their first 50 years available for
free download, so anyone should be able to get the cited papers via
the DOI values in their BibTeX entries.

Their findings are different from what most people then thought, and
clearly continue to think.  In particular, these two hoc functions
return the required digit counts:

        ### matula(nbits): return number of decimal digits needed to ensure
        ### correct round-trip conversion between binary and decimal
        func matula(nbits) { return ceil(nbits/log2(10) + 1) }

        ### goldberg(ndecdig): return number of bits needed to ensure correct
        ### round-trip conversion between binary and ndecdig decimal digits
        func goldberg(ndecdig) { return ceil(ndecdig/log10(2) + 1) }

For the four IEEE 754 binary formats, here are the Matula counts:

        format          decimal digits
        binary32                 9
        binary64                17
        binary80                21
        binary128               36

On x86_64 systems then, gm2 should use replacement sigFigs values of
9, 17, and 21 for the SHORTREAL, REAL, and LONGREAL types.  Those
values can also be caps on the counts produced when larger sigFigs
values are requested by user code.

gm2's Write*Real() functions produce 17, 37, and 45 digits for those
formats, after which, trailing zeros are supplied.

I'm working on getting access to a working gm2 for systems that use
the IEEE 754 binary128 format for LONGREAL, but cannot yet report
results for those platforms.


(2) The output line for eps32 above shows a result of 0.000000 from
WriteShortReal (eps, 0); that would be more sensibly displayed in
exponential form to show that it is nonzero.


(3) When sigFigs = 0, all three test programs produce rubbish output
from WriteFloat():

        sigFigs =  0   eps32 = 
        sigFigs =  0   eps64 = 
        sigFigs =  0   eps80 = 

Once again, the Matula defaults would be far more sensible here.


It is not only gm2 that has such issues. Below is a snippet (in LaTeX)
from a book that I'm writing about a new math library that I'm
developing.  It describes how widespread is the ignorance of the
Matula and Goldberg formulas published more then 55 years ago in
possibly the most widely read journal in the field of computer

Regrettably, output conversions without precision specifications in
both \X{Fortran} and \X{C} default to 6 fraction digits for
floating-point output, whereas Matula mandates 9 for the 32-bit IEEE
format, 10 for older 36-bit formats with 27-bit fractions, and 17 for
the 64-bit IEEE format.  Such defaults are likely to be relied on by
many users, who do not understand that subsequent input of those
truncated values will \emph{not} reproduce the numbers that were in
the computer, because they lose 3 or more trailing bits.

More recent programming languages are little better.  For the IEEE 754
32-bit and 64-bit binary formats, without specifying an output
precision, \X{COBOL} produces 8 and 16 decimal digits, \X{C\#} outputs
7 and 15, and \X{Go}, \X{Java}, and \X{Rust} output 8 and 17.
Some languages only provide the 64-bit format: the \X{Awk}, \X{PHP},
and \X{Python} scripting languages all output 6 digits,
\X{Algol 68 Genie}, \X{LibreOffice}, and \X{Perl} output 15 digits,
\X{Matlab} and \X{Octave} output 5 and 16 in \code{short} and \code{long} 
\X{GNU} \X{Pascal} outputs 16 digits, and
\X{JavaScript} and \X{MetaPost} produce the Matula-mandated 17 digits.
Given the age of some programming languages,%
\footnote{\X{Fortran} appeared first in 1956, \X{COBOL} in 1959,
           \X{Algol 68-R} and \X{Pascal} in 1970, \X{C} in 1972,
           and \X{C++} in 1979.}
and given that several of them are defined by international standards
written by presumed experts in the languages and their practical
implementation, we have considerable evidence that correct round-trip
base conversion has been generally ignored, and the chosen defaults
are simply \emph{wrong}.  This author's \pgm{hoc} implementations,
however, all follow the Goldberg%
\index{Goldberg, Bennett}
and Matula%
\index{Matula, David [American mathematician and computer scientist]}%
\index{United States}
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                                                          -
- Department of Mathematics, 110 LCB    Internet e-mail:  -
- 155 S 1400 E RM 233              -
- Salt Lake City, UT 84112-0090, USA    URL: -

Attachment: TstRealOutput.f90
Description: Text Data

Attachment: TstRealOutput.amd_flang
Description: Text document

Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]