[lmi] Historical product files

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Historical product files

From:	Greg Chicares
Subject:	[lmi] Historical product files
Date:	Wed, 19 May 2010 16:55:08 +0000
User-agent:	Thunderbird 2.0.0.24 (Windows/20100228)

What's the best format for adding historical data to xml product files?

* Background

A few fields in various product files change from time to time [0]. We've
never yet used the historical (superseded) values, but we'd like to use
them someday--so we have carefully recorded them as comments in the C++
source code heretofore used for generating product files. For example, we
might have:

#if VERSION == 20020101
  Add(database_entity(DB_MaxGenAcctRate, 0.085)); // Effective 2002-01-01.
...
#else if VERSION == 20100301
  Add(database_entity(DB_MaxGenAcctRate, 0.063)); // Effective 2010-03-01.
#else if VERSION == 20100401
  Add(database_entity(DB_MaxGenAcctRate, 0.062)); // Effective 2010-04-01.
#else if VERSION == 20100501
  Add(database_entity(DB_MaxGenAcctRate, 0.061)); // Effective 2010-05-01.
#endif

We'd very much like to stop using that code. We have to keep it outside the
public repository [1], which is a nuisance; updating it by emailing 'sed'
commands is growing tiresome:
  http://lists.nongnu.org/archive/html/lmi/2010-05/msg00007.html
It poses certain unique challenges [2] that we'd rather avoid. Furthermore,
maintaining it requires a combination of C++ and product knowledge that few
people have--whereas xml files are easier to maintain.

We need to design an xml format to hold this historical information. It's
best if we do that now, so we can migrate all the history soon. There's
quite a lot of history, for interest rates in particular--ten years of
monthly history for each of several dozen products--and it'll probably be
much easier and more reliable to use the C++ code to transfer that history
into xml, as opposed to copying and pasting it manually. To use historical
data will require more work later, but we can move the data now (and retire
the old code) if we just design a workable format.

Today, for the interest rate in the example above, we have:
  <item>
    <key>MaxGenAcctRate</key>
    <axis_lengths>
      <item>1</item><item>1</item><item>1</item><item>1</item>
      <item>1</item><item>1</item><item>1</item>
    </axis_lengths>
    <data_values> <item>0.06</item> </data_values>
    <gloss></gloss>
  </item>
How might we add history to that?

I'm guessing that we should add a date attribute. For example:
  <item EffectiveDate="20020101"> <!-- 2002 January  data -->
  <item EffectiveDate="20020201"> <!-- 2002 February data -->
When a new product is introduced, every top-level element would have its
"EffectiveDate" attribute set to the date of introduction. When a datum is
modified, its original element would remain, and a new one with a different
date would be added. [BTW, this wouldn't work:
  <data_values EffectiveDate="20020101"> <item>0.085</item> </data_values>
because (e.g.) the axes might change over time.]

Alternatively, I suppose we could make historical date a new element:
  <item>
    <EffectiveDate>"20020101"</EffectiveDate>
    <key>MaxGenAcctRate</key>
  </item>
That's largely an "aesthetic preference":
  http://lists.nongnu.org/archive/html/lmi/2010-04/msg00008.html
For each historical datum, "EffectiveDate" would be "unique by definition":
  http://lists.nongnu.org/archive/html/lmi/2010-04/msg00010.html
so an attribute seems slightly preferable to an element.

Of course, we could use xml comments to suppress everything that isn't
current, but then we'd have to alter the format later to make such data
accessible again.

Is there any radically-different good idea that I've overlooked?

Once we've chosen a format, another question arises: how can we access
only the most-current data for now? Eventually, we'd like to choose among
historical versions based on GUI inputs that don't yet exist; but, for now,
I'd rather just find a clean way to get only the most recent one. Is there
some reliable shortcut? For example, I imagine we could order top-level
elements chronologically by "EffectiveDate" attribute (perhaps 'xmllint'
would even do that for us), and then just read them into a C++ map:
  data["MaxGenAcctRate"] = 0.085; // EffectiveDate="20020101"
  ...then 0.085 gets overwritten...
  data["MaxGenAcctRate"] = 0.070; // EffectiveDate="20050101"
  ...then 0.070 gets overwritten...
  data["MaxGenAcctRate"] = 0.060; // EffectiveDate="20100501"
or is that too repugnant?

---------

[0] "A few fields in various product files change from time to time"

Viz.:
  '*.database'
    DB_MaxGenAcctRate
      This changes as often as every month.
    DB_StateApproved
      This might change every month for the first few months after a
      product is introduced; later, it would almost never change.
    DB_PremTaxRate
    DB_PremTaxLoad
      We typically change these once a year, in May. They might change
      at other arbitrary times.
  '*.funds'
    All contents typically change every May.
  '*.policy'
    Anything here can change at any time, but I don't think we ever
    need to preserve historical values.
  '*.policy *.rounding *.strata'
    These files almost never change.

[1] "have to keep it outside the public repository"

This comment explains why (skip it if you're not curious):

// This file is a template for embedding product-specific data. Doing
// that creates a derived work covered by the GPL. But you may prefer
// not to publish your data, for instance because it is proprietary.
// In that case, the GPL does not permit you to distribute the derived
// work at all. But read the second paragraph of section 0 of the GPL
// carefully: it permits you to run your modified version of the
// program--and to distribute its output, which is not a derived work
// because it's merely your data, trivially cast in a format suitable
// for use with lmi. You can therefore distribute the files created by
// your modified version of this program, but not that program itself.
// Those files are all you need: distributing the program itself isn't
// necessary anyway.

[2] "certain unique challenges"

For example, see these comments:

# The product_files target doesn't build with shared-library
# 'attributes'. That matters little because that target is deprecated.

# An overriding version of 'my_prod.cpp', which is used to create a
# nondistributable binary, contains so many large strings that, after
# consuming more than one CPU minute and 1 MiB of RAM, MinGW gcc-3.4.5
# produces a diagnostic such as
#   warning: NULL pointer checks disabled:
#   39933 basic blocks and 167330 registers

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] Historical product files, Greg Chicares <=
- Re: [lmi] Historical product files, Vaclav Slavik, 2010/05/20
  - Re[2]: [lmi] Historical product files, Vadim Zeitlin, 2010/05/20
    - Re: [lmi] Historical product files, Greg Chicares, 2010/05/24
    - Re[2]: [lmi] Historical product files, Vadim Zeitlin, 2010/05/25

Prev by Date: Re: [lmi] using log1p(-1)
Next by Date: Re: [lmi] Historical product files
Previous by thread: [lmi] using log1p(-1)
Next by thread: Re: [lmi] Historical product files
Index(es):
- Date
- Thread