lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] an xml schema for (single|multiple)_cell_document file XML for


From: Greg Chicares
Subject: Re: [lmi] an xml schema for (single|multiple)_cell_document file XML format
Date: Tue, 13 Mar 2012 02:43:32 +0000
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 2012-03-12 14:46Z, Václav Slavík wrote:
[...]
> I had a closer look at several RELAX NG tools; in the end, I settled
> on Jing (http://www.thaiopensource.com/relaxng/jing.html, by the same
> folks as Trang). It has the most complete implementation of RELAX NG
> Compact syntax, the best error messages and supports other schema
> languages too.

Clear, concise--obviously these '.rnc' files are what we want.

I'll leave it to others to experiment with 'jing'--I haven't installed
it here yet. But xmllint's error messages are pretty good [see below].

> Other than Jing, I tried:
> 
> 1. xmllint — doesn't handle RELAX NG Compact Syntax at all, only the
> rather verbose XML one.

After examining the '.rnc' files, the '.xsd' equivalents seem ugly.

> 2. rnv — Compact Syntax only validator, implemented in C. It doesn't
> recognize all of the language (it couldn't handle the "grammar"
> keyword; fortunately, it's optional). It's error messages were either
> cryptic or amounted to little more than, paraphrased, "syntax error"
> or "invalid value". It didn't even provide useful source file locations
> (the worst offender was that any issue inside cell.rnc was reported at
> illustration.rnc:5, i.e. at the place where cell.rnc was included).

Ok, thanks--we needn't spend any more time on 'rnv'.

> I am also attaching an example census.xsd file with XML Schema converted
> from census.rnc. It's rather large (126kB compared to <19kB of .rnc files),
> although not as large as its corresponding RELAX NG XML file (409kB). It's
> much less human-readable than the .rnc files, though. For one thing, it's
> heavily structured, verbose XML, that is inhuman in itself. But to make
> matters worse, Trang doesn't support RELAX NG external references that I
> rely on. So I had to run the .rnc files through jing -s to produce
> simplified versions without them (this is how I ended up with 409kB of
> .rng file) and convert that to .xsd. This simplification step removed
> (by expanding them) custom data types and duplicated the schema parts
> corresponding to <cell>, making it a poor choice for human reading.

So we need 'jing' in any case. And perhaps I've judged the xml-schema
language too harshly from an example that doesn't put it in the most
favorable light, but I'm still sure '.rnc' is what we want.

> The results aren't that bad if the simplification step is omitted — see
> attached illustration.xsd.

OK, so maybe I haven't judged xml-schema too harshly.

Anyway, here are some casual tests using 'xmllint' with the '.xsd' files.

Test a '.cns' file with an unsupported older format:

/opt/lmi/eraseme[0]$grep multiple_cell_document sample.cns
<multiple_cell_document>
</multiple_cell_document>
/opt/lmi/eraseme[0]$xmllint --noout --schema /lmi/src/lmi/a00/vs/rng/census.xsd 
sample.cns
Element 'multiple_cell_document': The attribute 'version' is required but 
missing.
Element 'cell': This element is not expected. Expected is ( case_default ).
sample.cns fails to validate
/opt/lmi/eraseme[3]

Same, but current format:

/opt/lmi/eraseme[0]$xmllint --noout --schema /lmi/src/lmi/a00/vs/rng/census.xsd
sample2.cns
sample2.cns validates
/opt/lmi/eraseme[0]$

Test a <cell> element with an impermissible value:

/opt/lmi/eraseme[0]$<sample2.cns >sample2bad.cns sed 
-e'/PremiumTaxState/s/CT/FC/'
/opt/lmi/eraseme[0]$xmllint --noout --schema /lmi/src/lmi/a00/vs/rng/census.xsd 
sample2bad.cns
Element 'PremiumTaxState': [facet 'enumeration'] The value 'FC' is not an elemen
t of the set {'AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA',
[...many more states...]
'WY', 'XX'}.
Element 'PremiumTaxState': 'FC' is not a valid value of the local atomic type.
sample2bad.cns fails to validate
/opt/lmi/eraseme[3]$

Test a missing <cell> element:

/opt/lmi/eraseme[0]$<sample2.cns >sample2bad.cns sed -e'/PremiumTaxState/d'
/opt/lmi/eraseme[0]$xmllint --noout --schema /lmi/src/lmi/a00/vs/rng/census.xsd 
sample2bad.cns
Element 'ProductName': This element is not expected. Expected is ( 
PremiumTaxState ).
[...repeats...]
sample2bad.cns fails to validate
/opt/lmi/eraseme[3]$

Multiple errors:

/opt/lmi/eraseme[0]$<sample2.cns >sample2bad.cns sed -e'/State/s/CT/FC/'
/opt/lmi/eraseme[0]$xmllint --noout --schema /lmi/src/lmi/a00/vs/rng/census.xsd 
sample2bad.cns
Element 'AgentState': [facet 'enumeration'] The value 'FC' is not an element ...
Element 'AgentState': 'FC' is not a valid value of the local atomic type.    ...
Element 'CorporationState': [facet 'enumeration'] The value 'FC' is not an e ...
Element 'CorporationState': 'FC' is not a valid value of the local atomic ty ...
Element 'PremiumTaxState': [facet 'enumeration'] The value 'FC' is not an el ...
Element 'PremiumTaxState': 'FC' is not a valid value of the local atomic typ ...
Element 'State': [facet 'enumeration'] The value 'FC' is not an element of t ...
Element 'State': 'FC' is not a valid value of the local atomic type.         ...
Element 'StateOfJurisdiction': [facet 'enumeration'] The value 'FC' is not a ...
Element 'StateOfJurisdiction': 'FC' is not a valid value of the local atomic ...
...
sample2bad.cns fails to validate
/opt/lmi/eraseme[3]$



reply via email to

[Prev in Thread] Current Thread [Next in Thread]