On Thu, 29 Jun 2006, Andy Adler wrote:
On Thu, 29 Jun 2006, Bill Denney wrote:
Andy Adler wrote:
I'm trying to write this. My idea is that XML like this
<a b="c" d="e"> text <f g="h"/> more text <f>data</f> </a>
Shouldn't it parse as something more like
v.a.ATTS.b = "c"
v.a.ATTS.d = "e"
v.a.CHILD{1} = " text "
v.a.CHILD{2}.f.ATTS.g => "h"
v.a.CHILD{3} = " more text "
v.a.CHILD{4}.f.CHILD{1} => "data"
My concern is that this output makes writing software to parse the
xml output really frustrating - you need to loop through the CHILD
vectors to find what you're looking for. This would result in
people taking shortcuts that make the code fragile.
But not doing it this way would make an incorrect representation for
nested structures: what about just parsing xhtml like
<p>abcd <i>efg</i> jklm</p>
would turn into
v.p.TEXT = 'abcd jklm';
v.p.i.TEXT = 'efg';
which would not reverse correctly because all of these could turn into
the above:
<p>abcd<i>efg</i> jklm</p>
<p><i>efg</i>abcd jklm</p>
<p>abcd jklm<i>efg</i></p>
...
To me, it should be a reversible transformation. Also, without keeping
the order, you would lose the ability to do full XSLT interpretation
(what do you do about sibling commands).
This is actually a big debate in the XML semantics community - the
fact that XML does not map easily to data structures.
I realize that it doesn't map easily, but it should map reversibly.