[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: PSPP-BUG: [bug #40864] Implement machine-parseable data definition f
From: |
Müller , Andre |
Subject: |
Re: PSPP-BUG: [bug #40864] Implement machine-parseable data definition format |
Date: |
Thu, 23 Jan 2014 18:04:54 +0000 |
Dear Ben, all,
as announced, I have by now written a converter to DDI-2.5 (that's DDI
Codebook) XML.
Thus, I can now provide the spec on how to write DDI-2.5 that validates.
An example file is attached.
The skeleton looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<codeBook ID="ZA2141_v1-1-0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="ddi:codebook:2_5
http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd"
xmlns="ddi:codebook:2_5">
<stdyDscr>
<citation>
<titlStmt>
<titl/>
</titlStmt>
</citation>
</stdyDscr>
<dataDscr>
<var ID="v1" name="v1">
<labl>STUDY NUMBER</labl>
</var>
<var ID="v2" name="v2">
<labl>EDITION NUMBER</labl>
<catgry missing="N">
<catValu>1</catValu>
<labl>PRELIMINARY EDITION</labl>
</catgry>
<catgry missing="N">
<catValu>2</catValu>
<labl>1ST CODEBOOK EDITION - release as of May
2, 2007</labl>
</catgry>
</var>
</dataDscr>
</codeBook>
Notes:
- The <stdyDscr>... bit must be included as the <titl/> tag needs to be
present, even if empty.
- All metadata go to the <dataDscr> section.
- Var ID could be legally filled with the Variable's Index, effectively
numbering it.
- Missings are defined as discrete values in the optional <catgry> section.
(The column's data entries have to be checked against the SPSS style missing
range definition,
that's not necessarily labeled data. An empty label field ought to be legal.)
- the following characters are disallowed outside CDATA sections and need to be
replaced:
> to >
< to <
& to &
' to '
" to "
Hope that helps,
Andre Müller
Example-DDI2_5.xml.gz
Description: Example-DDI2_5.xml.gz