octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML tools for Octave


From: Alois Schloegl
Subject: Re: XML tools for Octave
Date: Mon, 12 Jun 2006 10:55:03 +0200
User-agent: Mozilla Thunderbird 0.9 (Windows/20041103)

Andy,

thanks for your comments.

Andy Adler wrote:

On Sat, 10 Jun 2006, Schloegl Alois wrote:

This works very well. Although the OCT version is not as fast as the MEX version, it works also for a case, were the MEX function causes a bad crash.

Andy, what do you think about the solution?
It does not recognize the example 1) as an invalid XML. How big of a problem is this?


Alois,

It depends on what you want to do with it.

I did a comparison of the various XML tools for available for matlab,
and wasn't impressed with any of them; this one has at least the advantage
of being simple and robust.

However, it has a few defects:

1. Parses non-XML files ( <b><i>tag</b></i> )

2. These files are identical in the infoset, but are
    parsed differently

     <XML> <tag/> </XML>             vs <XML> <tag></tag> </XML>
     <XML> &quot; </XML>             vs <XML> " </XML>
     <XML> <![CDATA[ text ]]> </XML> vs <XML> text </XML>

3. Doesn't manage xml comments (<!-- -->)

Comments are loaded and appear in the substructure  data  as strings.

4. Confused by processing instructions
     <?xml version="1.0" encoding="UTF-8"?>

What do you mean by "confused"? So far I've not observed a problem with this.


Are these a big issue for you? I would suspect that 3 and
4 are probably quite important.


No, not really. I do not care about comments. I'm a little bit concerned though that some non-XML data could cause some indefinite state.

Currently, the biggest issue for me is that the outcome needs some postprocessing.
Instead of
      X.data.='aECG'
      X.sub = {anystruct}

I'd like to see
      X.aECG = {anystruct}


What features are you looking for in a XML parser?

Basically, I want to load large XML data sets, e.g. like this one
http://hci.tugraz.at/schloegl/biosig/xml/Exa01.xml
and extract the information which is needed for the common interface on biomedical data formats in the "BioSig for Octave and Matlab" toolbox.

Moreover, reading OpenOffice Spreadsheet files (which are just zipped XML) would be a nice and useful feature.


--
Andy Adler <address@hidden> 1(613)562-5800x6218



I've included a link at the wiki site.


APPENDIX: My comments on XML tools:

1. XML4MAT
   http://bioinformatics.org/project/?group_id=172
   LICENCE: GPL V2

   *.m code throughout
   Uses regexps to parse XML.
   Fails for many simple examples (ie <tag att="<<"/>)

2. XML Toolbox for Matlab
   http://www.geodise.org/toolboxes/generic/xml_toolbox.htm
   LICENCE: Custom (BSD with advertising)

   Output format looks interesting
   Appears to only offer p-code downloads and to link
   to Java xml parsers


3. XML Tree
   http://www.artefact.tk/software/matlab/xml/
   LICENCE: GPL

   Extensive use of Matlab classes

   BUGS:
   Doesn't record attributes
   Misses mixed contents:
    Misses <b to end:  <i> adsf <b/> asbf </i>
    Misses asbf:       <i> adsf <b> </b> asbf </i>

   Confused by Processing Instructions
   Confused by Entities
   Confused by CDATA

   Crashes with namespaces

   OK:
      XML procesing instructions
      XML comments
      XML entities

4. XML Tools
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=1742&objectType=FILE
   LICENCE: GPL

   BUGS:
     Parses non XML: <c t="C">F S</cd>
                     <b><i>tag</b></i>

     Infoset equivalent files aren't: <a></a> ~= <a/>

     Doesn't parse entities
     Doesn't manage comments
     Doesn't manage CDATA
     Confusted by <?xml ?>

   OK:
    Multiple attributes







reply via email to

[Prev in Thread] Current Thread [Next in Thread]