octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML tools for Octave


From: Andy Adler
Subject: Re: XML tools for Octave
Date: Thu, 29 Jun 2006 12:14:31 -0400 (EDT)

On Thu, 29 Jun 2006, Bill Denney wrote:

On Thu, 29 Jun 2006, Andy Adler wrote:

On Thu, 29 Jun 2006, Bill Denney wrote:

Andy Adler wrote:
I'm trying to write this. My idea is that XML like this
<a b="c" d="e"> text <f g="h"/> more text <f>data</f> </a>

Shouldn't it parse as something more like

v.a.ATTS.b = "c"
v.a.ATTS.d = "e"
v.a.CHILD{1} = " text "
v.a.CHILD{2}.f.ATTS.g => "h"
v.a.CHILD{3} = " more text  "
v.a.CHILD{4}.f.CHILD{1}   => "data"

My concern is that this output makes writing software to parse the xml output really frustrating - you need to loop through the CHILD vectors to find what you're looking for. This would result in people taking shortcuts that make the code fragile.

But not doing it this way would make an incorrect representation for nested structures: what about just parsing xhtml like
<p>abcd <i>efg</i> jklm</p>

would turn into

v.p.TEXT = 'abcd  jklm';
v.p.i.TEXT = 'efg';

which would not reverse correctly because all of these could turn into the above:
<p>abcd<i>efg</i>  jklm</p>
<p><i>efg</i>abcd  jklm</p>
<p>abcd  jklm<i>efg</i></p>
...

To me, it should be a reversible transformation. Also, without keeping the order, you would lose the ability to do full XSLT interpretation (what do you do about sibling commands).

This is actually a big debate in the XML semantics community - the fact that XML does not map easily to data structures.

I realize that it doesn't map easily, but it should map reversibly.

I honestly don't know how to address this. Your suggestion would basically mean that we expose the full DOM API. Matlab did this with
a thin wrapper over Java's DOM.

However, this is a really bad idea. It means that all the effort of
XML parsing falls on the user - who will either make mistakes or
shortcuts, with the result of really fragile code.

For example a user will parse
     <data><item> 1 </item></data>

using
   v.data.CHILD{1}.item.TEXT{1}

But this will break when you have
     <data> <item> 1 </item></data>

or
     <data><metadata/><item> 1 </item></data>


So I don't think that pushing all the complexity to the user is right.
Somehow it should be easy to do easy things, but possible to do correct things.

How about:
 <a b="c" d="e"> text <f g="h"/> more text <f>data</f> </a>

 v.a.ATTS.b      : "c"
 v.a.ATTS.d      : "e"
 v.a.TEXT{1}     : " text "
 v.a.TEXT{2}     : " more text "
 v.a.TEXT{3}     : " "
 v.a.f{1}.g      : "h"
 v.a.f{2}.TEXT{1}: "data"

and some extra information in:

 v.a.NAMESPACE
 v.a.ORDEREDELEMS
 v.a.UTFNAMES

This is starting to look like a big project ;-<


--
Andy Adler <address@hidden> 1(613)562-5800x6218



reply via email to

[Prev in Thread] Current Thread [Next in Thread]