bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [OT] Is od broken?


From: Eric Blake
Subject: Re: [OT] Is od broken?
Date: Thu, 12 Jun 2008 21:57:42 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Jim Meyering <jim <at> meyering.net> writes:

> > Unrelated to this patch: Should we use %g instead of %e for floating point?
> > Seeing 1.000000000e+00 is somewhat distracting in isolation; on the other 
hand,
> > the variable-width nature of %g might not look as nice as the fixed-width
> > precision of %e.
> 
> I have a slight preference for the status quo.  And not having
> looked at how other vendor od programs work, I'm hesitant to change this.
> 

Agreed.  Here's another floating point quandary:

od --help states that it will "Write an unambiguous representation".  But this 
is not always true with floating point.  There's the obvious case of x86 long 
double - since we are converting 12 bytes in memory into 10 bytes (actually 79 
bits) of significant information in register, we are discarding two bytes of 
input data for every string we output.  There's also the case of NaN (in IEEE 
double-precision, almost 2^53 distinct values display as "nan", unless your 
libc's printf is nice enough to give "nan(n-char-sequence)" output).

Then there's less obvious cases where rounding bites us.  Without my patch 
series, the code is blindly claiming that the field width of a 4-byte IEEE 
float is FLT_DIG+8 (14 bytes) without the leading space, even though the format 
will never print more than 13 non-space characters (sign, first digit, dot, 6 
FLT_DIG precision digits, e, sign, then 2 digits exponent [FLT_MAX_10_EXP is 
37, so we'll never see 3 digit exponents on -tf4]); so we are over-padding.

But POSIX is clear that FLT_DIG is rounded down (unless your radix is a power 
of 10), to cover the decimal-binary-decimal round trip, whereas DECIMAL_DIG is 
rounded up, to cover the binary-decimal-binary round trip.  In the case of od, 
we want the algorithm of DECIMAL_DIG if we are to guarantee uniqueness.  And 
notice that simply doing FLT_DIG+1 is STILL insufficient, as shown by this test 
case of the four adjacent floats 123456776.0f to 123456800.0f:

$ src/od -tfFx1x4 blah
0000000 -1.2345678e+08 -1.2345678e+08 -1.2345679e+08 -1.2345680e+08
         a1  79  eb  cc  a2  79  eb  cc  a3 79  eb cc  a4 79  eb cc
              cceb79a1       cceb79a2       cceb79a3       cceb79a4
0000020

In order to safely go binary-decimal-binary, the unique decimal representation 
of 0xcceb79a2 as an IEEE single-precision float MUST be -1.23456784e+08, or 
FLT_DIG+2 bytes of precision.

So, which is better, patching the code to attempt to unambiguously print all 
floats, or updating the documentation to make it clear that memory 
representation padding, floating point rounding, and NaNs cause inaccuracies?

And looking at that output, I need to redo my pad width rounding algorithm.  It 
would be nicer to consistently pad that second row as four sets of 2-2-2-1 
rather than the somewhat ugly 2-2-2-2/2-2-2-2/2-1-2-1/2-1-2-1.  I guess I'm 
back to the drawing board for an efficient way to cleanly distribute a fraction 
of a padding byte without suffering from integer overflow during the 
computation.

> > Should we squash this on top of the previous patch, or keep it as a separate
> > commit?
> 
> I think it's fine (and probably better, but haven't reviewed either
> carefully yet) to keep them separate.

OK, I'll keep them as separate commits.  Bo inspired me, and I finally figured 
out how to use repo.or.cz.  Now you can do:
git fetch git://repo.or.cz/coreutils/ericb.git refs/heads/od

to see my patch series.

-- 
Eric Blake







reply via email to

[Prev in Thread] Current Thread [Next in Thread]