lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Why should we sort XML documents?


From: Greg Chicares
Subject: Re: [lmi] Why should we sort XML documents?
Date: Mon, 5 Mar 2018 22:41:44 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 2018-03-05 18:50, Vadim Zeitlin wrote:
[...]
> GC> [...if we omit sorting, then...] We really have nothing to gain.
> 
>  I somewhat disagree with this too. Simplification is always nice, and
> removing the need to sort the cells here would allow us to not use
> libxsltwrapp (and hence libxslt) at all any longer once the XSL-FO code is
> finally removed which is, IMHO, a not-negligible payoff for very little
> work.

Wow, I didn't see that--I somehow thought that XSD (and RNG) were
supported by libxslt, but no, libxml2 does that. Removing the libxslt
dependency is a really big deal. Stepping back and reanalyzing...

We absolutely will not add a new production dependency on "java", so we
can't use 'jing', which means we can't validate with RNC. I.e., we can
maintain the authoritative sources as RNC, but we don't have an RNC
validator that we can use in production.

But we can translate RNC to RNG, and libxml2 can use RNG, which, AIUI,
is just a different but equivalent representation of RNC: IOW, it's a
lossless translation.

The reason we've been using XSD is that we can't use RNC (java), and
RNG with libxml2 produces hard-to-read diagnostics. But I suppose that
RNG, as implemented by libxml2, has the same power as RNC with jing;
it's just that any diagnostics may be a little harder to interpret.

And, because we've been relying exclusively on XSD, we've sorted the
input--because XSD is less capable than RN[CG], and sorting partially
mitigates that loss of capability. So, actually, what sorting buys us
is really just clearer error messages.

Instead, suppose we validate with RNG+libxml2 in production...thus,
diagnosing any problem we could find with RNC+jing, AIUI. If that
validation pass succeeds, we're done validating--and there's no need
(real, perceived, or otherwise) to sort. If it fails, and we don't
understand the diagnostics, then we can, manually, either:
 - sort the input, run it through XSD, and use those diagnostics; or
 - much more sensibly, just use jing with RNC, which is best of all.
This manual fallback step is entirely optional, and I don't mind using
"java" for a rare, manual action--I just don't want lmi to depend on it.

This is much like using gcc for production but keeping clang handy. We
run gcc first, knowing that if it diagnoses an error, we might not
readily understand its cryptic output, so in that case we can fall back
on clang to see if it'll give us better diagnostics.

'test_schemata.sh' shows the difference in quality of diagnostics:

[1]  invalid input, jing, .rnc:
character content of element "InforceDcv" invalid; must be a floating-point \
 number greater than or equal to 0

[2]  invalid input, xmllint, .xsd:
Element 'InforceDcv': [facet 'minInclusive'] The value '-12345.67' is less \
  than the minimum value allowed ('0').
Element 'InforceDcv': '-12345.67' is not a valid value of the atomic type \
  'nonnegative_double'.

[3]  invalid input, xmllint, .rng:
Error validating datatype double
Element InforceDcv failed to validate content

In production today, we have [2], and we'd trade that for [3]. Either
can be understood with effort or insight. If a file had hundreds of
errors, I'd run it manually through jing+RNC to get [1] because that's
more readable; but that's just a matter of convenience. In practice,
it's enough for us to know the name of the offending element, because
we know that an inforce deemed cash value must not be negative.

Of course, I'm assuming that xmlwrapp already handles RNG, or can
easily be extended to do so; and that it handles RNG in the same way
that xmllint does, which seems highly probable. But once we've made
sure of those preconditions, getting rid of libxslt altogether is
well worth the effort required to alter the schemata.

BTW, when I searched the web to double-checking which xmlsoft library
supports schema validation, I stumbled upon this mention of "schema":

  https://vslavik.github.io/xmlwrapp/manual/stylesheet_8h_source.html
|   Errors are handled by @a on_error handler; by default, xml::exception
|   is thrown on errors. If there's a fatal error that prevents the schema

which occurs twice, once for each ctor. AFAICT, "schema" should be
changed to "stylesheet" in these two locations.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]