bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] pi in gawk


From: Nelson H. F. Beebe
Subject: Re: [bug-gawk] pi in gawk
Date: Fri, 17 Feb 2012 08:39:48 -0700 (MST)

>> Why 35?
>> The decimal(binary) precision for IEEE 754 128-bit is 34(113).

Your statement is correct; I chose 35 just as an example to get a
reasonably long value of $\pi$ to answer the posted question about
using that value in gawk.

>> ...
>> With the default rounding to the nearest:
>> 
>> $ ./gawk -M -vPREC=113 'BEGIN { printf("%.33f\n", atan2(0, -1)) }'
>> 3.141592653589793238462643383279503
>> 
>> Rounding towards zero:
>> 
>> $ ./gawk -M -vPREC=113 -vRNDMODE="RNDZ" 'BEGIN { printf("%.33f\n", atan2(0, 
>> -1)) }'
>> 3.141592653589793238462643383279502
>> 
>> If I increase the precision to 117 and print 1 more digit:
>> 
>> $ ./gawk -M -vPREC=115 'BEGIN { printf("%.34f\n", atan2(0, -1)) }'
>> 3.1415926535897932384626433832795029
>> 
>> However, rounding towards zero gives us:
>> 
>> $ ./gawk -M -vPREC=115 -vRNDMODE="RNDZ" 'BEGIN { printf("%.34f\n", atan2(0, 
>> -1)) }'
>> 3.1415926535897932384626433832795027
>> 
>> I was expecting 8 or 9 in the last digit, not 7.
>> Is there an explanation?
>> ...

Yes, there is an explanation: $\pi$ is a transcendental constant, and
thus has an infinite expansion in digits of almost any base, except a
base of $\pi$ itself, and powers thereof.  Thus, to represent it in
human-readable decimal form, computation is required, and that
involves inexact operations, each of which must be rounded to working
precision according to some rounding rule.  The cumulative affect of
several such operations almost always makes the last reported digits
uncertain.  If precision is variable, as it is with the MPFR library,
you can always work with a few more digits and mask those errors.  If
you use n more bits that you finally want, then the chances of a
worst-case rounding error that makes the last reported digit off by
one goes like 2**(-n).  If n is large enough, that error is unlikely
to be seen in practice.

Here is an example from the decimal arithmetic literature that I use
in my book: compute the cost of a $0.70 item with a 5% tax rate.  

In decimal, the answer is easy, and exact: 0.70 * 1.05 = 0.735.  Under
taxman's rounding rules, halfway cases are rounded in favor of the tax
authorities, so you must pay $0.74, because we no long have ha'penny
coins.

In binary arithmetic, the product represented in 128-bit format
hexadecimal is +0x1.7851eb851eb851eb851eb851eb86p-1, and converted to
decimal, that produces a number like 0.73499999....  The computed
result is close to, but less than, the exact result, which is a
halfway case.  Under most rounding rules, including taxman's, that
becomes 0.73, cheating the government of $0.01.  

That may seem a trivial amount, but for companies with large numbers
of small transactions, like telephone companies and grocery stores,
the errors can amount to millions of dollars a year.  

That is why many governments require such arithmetic to be done in
decimal, with specified rounding requirements, and the IEEE 754-2008
revision accommodates those mandates, providing decimal as well as
binary arithmetic, and supplying additional rounding modes.

IBM zSeries mainframes, and IBM PowerPC 6 and PowerPC 7 chips, all
supply that decimal arithmetic, and I expect other floating-point chip
vendors will do so in the near future.

I, and some other members of the floating-point community, feel
strongly that decimal arithmetic is a good thing, because it
eliminates the base-conversion errors that continually plague and
mystify users who input a short number, and get a long one back with a
string ending in 99999...  Computers should work for people, not the
reverse, so switching floating-point arithmetic from binary to decimal
in scripting languages, spread sheets, and even mainstream programming
languages, will be helpful.  Of course, the back-end libraries need to
be redesigned, but that has been done, and is what my book is about.

The number of digits required for correct round-trip rounding between
bases is not widely understood, either by users, or by programming
language and run-time library designers.  Here is documentation of
some hoc functions that provide an explanation:

hoc> ?matula
matula(nbits):

        matula(nbits) is the number of decimal digits needed to ensure
        correct round-trip conversion between binary and decimal of
        floating-point data with nbits bits in the significand.

        matula(P) is that number for this implementation of hoc.

        In general, for decimal-to-binary conversion of d decimal digits
        to p bits, we need to ensure that

                10**d < 2**(p - 1),

        and for binary-to-decimal conversion, we require that

                2**p < 10**(d - 1).

        See David W. Matula, ``In-and-out conversions'', Comm. ACM
        11(1) 47--50, January 1968.  CODEN CACMA2.  ISSN 0001-0782.

        See also help_goldberg(), help_goldbergb(), and help_matulab().

hoc> ?goldberg
goldberg(ndecdig):

        goldberg(ndecdig) is the number of bits needed to ensure
        correct round-trip conversion between binary and decimal of
        floating-point data with ndecdig decimal digits in the
        significand.

        In general, for decimal-to-binary conversion of d decimal digits
        to p bits, we need to ensure that

                10**d < 2**(p - 1),

        and for binary-to-decimal conversion, we require that

                2**p < 10**(d - 1).

        See I. Bennett Goldberg, ``27 Bits Are Not Enough For 8-Digit
        Accuracy'', Comm. ACM, 10(2) 105--106, February 1967.  CODEN CACMA2.
        ISSN 0001-0782.

        See also help_goldbergb(), help_matula(), and help_matulab().

Here are some examples:

hoc> matula(24)
 9
hoc> matula(53)
 17
hoc> matula(64)
 21
hoc> matula(113)
 36

Thus, even though the 32-bit IEEE 754 format supplies a 24-bit
significand, capable of exactly representing numbers up to 16_777_216,
you actually need 9, not 8, decimal digits to be able to recover the
exact binary representation from decimal output.

In the reverse direction, we have

hoc> goldberg(7)
 25
hoc> goldberg(16)
 55
hoc> goldberg(34)
 114

The IEEE 32-bit decimal format, which supports 7 digits, needs 25 bits
in binary, and cannot be correctly round-tripped with the 32-bit
binary format, which supplies only 24 bits.  The same is true of the
64-bit and 128-bit formats.

In summary, when you have the luxury of variable precision, always use
more than you finally need: you'll then be more likely to get the
right answer.

As a final note, I can report that Maple uses decimal floating-point
arithmetic (a good thing, in my view), whereas all other algebra
systems (Mathematica, Maxima, MuPAD, Reduce, gp/pari, axiom,
Macaulay2, sage, ...) use binary floating-point arithmetic.


-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: address@hidden  -
- 155 S 1400 E RM 233                       address@hidden  address@hidden -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]