lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Converting numbers in mortality tables to and from text


From: Greg Chicares
Subject: Re: [lmi] Converting numbers in mortality tables to and from text
Date: Sat, 19 Mar 2016 17:26:46 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0

On 2016-03-18 18:09, Vadim Zeitlin wrote:
> On Fri, 18 Mar 2016 01:37:27 +0000 Greg Chicares <address@hidden> wrote:
[...]
> GC> IOW, we treat table data as fixed-point decimal values, because
> GC> that's what they were in the original print publications.
> 
>  This makes me wonder if I should be actually using "double"s for their
> internal representation at all.

If we were designing this from scratch, integers would clearly be better,
because the actuaries who prepare these tables really do conceive the data
as fixed-point decimal numbers.

> I originally did it without thinking much
> about it just because this is how they're represented in the binary SOA
> files, but now that I do think about it, it seems that representing them as
> an integer would work better. If I used the (same sized as double) uint64_t
> for this, we would have at least 19 significant digits and even accounting
> for 2[*] of them before the decimal point, this would still give us more
> than enough precision. I'm not sure if it's worth changing it now that the
> code using doubles had been already written, but I think it is because it
> would ensure that comparing two tables would work as expected, whereas now
> I have to compare their textual forms because of this 1 ULP mismatch.

We've been talking about a round-trip guarantee that ensures that text
values (decimal fixed point) and binary values (binary floating point)
correspond. But the correspondence is not an equivalence--e.g.:
  text   value T: 0.12345678
  binary value B: 0.123456779999999474973587234...
We require
  text(B) == binary(T)
which works because in practice the SOA hasn't tried to store any table
with more than nine decimal digits, which is much lower than 1 ulp.
(Of course, Bruce Dawson's blog that you referenced earlier would point
out that using single-precision floating point here would cause real
practical problems.)

Your alternative would give a identity guarantee, which is stronger.
We don't need that extra strength here. It would be nice to have it
anyway, unless there's an argument against it. And I think there is
such an argument:

 - The tables we have from SOA use floating point. As long as we read
   them (correctly) in their published form, it is trivially guaranteed
   that we have the actual values intended by the SOA, and that guarantee
   is important.

 - We would lose that guarantee if we gave up the ability to read the
   SOA's tables, and instead translated them to a new integer format.
   We might make a mistake in translation.

 - Supporting a floating-point "legacy" format alongside a "modern"
   integer format would make the code significantly more complex and
   likelier to contain defects. {Simple, dumb, more robust} beats
   {complex, elegant, less robust}.

Given enough time, we could test this well enough to mitigate any concern
about defects introduced by the change in internal representation. But
it's just not worthwhile.

> [*] There can't be any more because it would break the textual format
>     representation then by making the columns run together. And in practice
>     there only seems to ever be 1 anyhow.

That's a latent problem in the SOA design. They started out with the
intention of publishing mortality tables, which contain only probabilities
of death, which are constrained to the unit interval [0, 1]. That constraint
holds for many other values that actuaries deal with, such as net single
premiums. But some useful tables don't obey that constraint: CVAT corridor
factors are a natural example, and it makes obvious sense to store them in
the same sort of database we use for mortality rates. In olden times,
corridor factors were almost always in [0, 9.99] and the one-leading-digit
rule happened to work well enough. Then the basis for these factors changed
so that values like 12.34 became possible, and we exceeded the presumptive
range, which required painful rework. If we were designing an optimal table
format from scratch, we'd choose unbounded double precision (in preference
to an integer subset).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]