lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] change file formats to XML


From: Greg Chicares
Subject: Re: [lmi] change file formats to XML
Date: Tue, 16 Mar 2010 04:07:49 +0000
User-agent: Thunderbird 2.0.0.23 (Windows/20090812)

On 2010-02-26 14:56Z, Vaclav Slavik wrote:
> 
> I uploaded the XML file formats patch to Savannah:
> 
>    https://savannah.nongnu.org/patch/index.php?7101

I committed the heart of this on 20100316T0157Z in revision 4783.
The remainder is fairly straightforward, but I plan to commit it
in discrete steps, one product-file extension at a time, because:
 - I want to make changes to the ancient 'ihs_*' files, and
 - we have to migrate a lot of (proprietary) commentary

> Some comments about the patch:
> 
> (0) The API for XML formats I/O is in xml_serialize.hpp.

Which is in svn now.

> By default,
> stream operator<< and operator>> are used for storing value types, but
> it can be -- and is -- customized by specializing
> xml_serialize::type_io<> template.

The basic architecture you laid out can't be improved AFAICT.

Some of the specializations are no longer required because value_cast<>()
is used instead of just stream inserters and extractors:

 - value_cast<>() already has an optimization for "converting" from
     std::string to std::string. (That's just an incidental benefit;
     the main reason for using value_cast is numerics.)

 - For mc_enum types, I thought it better to use the (heavy) mc_enum
     instead of the underlying (light) C enum; that doesn't add much
     overhead in 'ihs_rnddata.cpp', and it writes (clear) names in
     the xml instead of (obscure) integers, e.g.:

      <specamt>
        <decimals>0</decimals>
        <style>Upward</style>
      </specamt>

     while simplifying 'xml_serialize.hpp'.

 - Somehow I had difficulty understanding the use of the convenience
   wrappers within the 'xml_serialize.hpp' implementation itself.
   That's just a personal problem I had following recursion through
   template specializations; but I wrote a few things out inline to
   make it easier for me to understand.

 - As for containers [I'm sure the specialization given works for
   sequences, but I'm not sure about other containers]...there was
   an asymmetry that troubled me: from_xml() called clear() on the
   container, but to_xml() didn't do anything similar. I know you
   had to do this in order to keep your patch simple by avoiding
   major changes to other code, but in this case I see I really do
   need to change that other code: as a temporary kludge, I added
     axis_lengths.clear();
   right after the using-directive in TDBValue::read(), and then
   removed the clear() call in from_xml(), and also added code to
   the generic container specialization to assert that we start
   with an empty container or an empty set of <item> elements.
   This was more satisfying (and also easier) than documenting
   the reason for the lack of parallelism (namely, a questionable
   original design in 'ihs_dbvalue.cpp').

> I'll focus on the performance later if needed.

Well, if you notice anything I've done that seems pessimal,
please point it out; otherwise, we can work on performance later.

> For now, I'd like to know
> what you think about this approach globally, and the serialization API
> in particular.

The way you did it is better than all the alternatives I explored,
and I tried hard to find a simpler way.

> (5) Versioning:
> 
> I opted for implicit versioning. As long as there's just one version of
> the format, no versioning code is needed. If the format changed, then
> the respective type_io<T>::from_xml() would be updated to deal with it.
> For example:
> 
> - When a new field is added and has a sensible default, load with with
>    get_property(node, "foo", foo, default_foo_value). Nothing else is 
>    needed to read both versions. In the other direction, unrecognized
>    properties are simply ignored.
> 
> - If a field semantics changes, rename it in the format. Then, read 
>    (and correctly interpret) the old field only if the new field isn't
>    found.
> 
> - If needed, we could add explicit versions on serialized keys, e.g.
> 
>      <coi_rate version="2">
>        <decimals>8</decimals>
>        <style>Downward</style>
>      </coi_rate>
> 
>    (We'd have to add code for reading the attribute later.)

I'm trying to rewrite the old product-file 'ihs_*.?pp' stuff along
the same lines as classes Input, Ledger, and mec_input--which already
deal with the two problems you point out:

> - If a field semantics changes

Simple example: suppose gender was {M|F|U} and we want to change it
to {Male|Female|Unisex}. That's what Input::RedintegrateExAnte()
is for: the xml element's text contents have to be translated to the
current style before they're read into variables.

> - When a new field is added and has a sensible default

Simple example: we used to store a person's name in {first, middle, last}
pieces, but then decided to keep the whole name in one string; so, when
reading an old file, we fetch the pieces and combine them. That's what
Input::RedintegrateExPost() is for: here, all elements' text contents
remain valid, but the elements themselves have changed.

It's a considerable effort, but those old source files really stand
in need of a thorough rewrite anyway, and I think it'll serve us well
to use one approach everywhere for backward compatibility.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]