emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with xml-parse-string


From: Lars Magne Ingebrigtsen
Subject: Re: Problems with xml-parse-string
Date: Wed, 22 Sep 2010 18:12:54 +0200
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

Chong Yidong <address@hidden> writes:

> First let me clarify a technical detail.  In your new format,

[...]

> seems to assume that element names never start with the colon character.
> That is, there can never be an element named ":type".
>
> The XML spec (http://www.w3.org/TR/2008/REC-xml-20081126/) seems to
> indicate that element names are allowed to start with a colon; see the
> definition of NameStartChar in section 2.3.
>
> It looks like the new format would give ambiguous results in that case.

True.

Like I said, I only wanted it for the HTML case, and the XML case was
just an afterthought.  And in HTML, there can be no :tags.

Looking at the output from xml.el and xml.c on two RSS feeds, the format
doesn't seem to be the biggest change, but the actual data:

This is from the same RSS feed.  First the xml.el parser:

(pp xml (current-buffer))
((rdf:RDF
  ((xmlns:rdf . "http://www.w3.org/1999/02/22-rdf-syntax-ns#";)
   (xmlns . "http://purl.org/rss/1.0/";)
   (xmlns:taxo . "http://purl.org/rss/1.0/modules/taxonomy/";)
   (xmlns:dc . "http://purl.org/dc/elements/1.1/";)
   (xmlns:syn . "http://purl.org/rss/1.0/modules/syndication/";)
   (xmlns:admin . "http://webns.net/mvcb/";))
  "\n  "
  (channel
   ((rdf:about . "http://blog.gmane.org/gmane.discuss";))
   "\n    "
   (title nil "gmane.discuss")
   "\n    "
   (link nil "http://blog.gmane.org/gmane.discuss";)
   "\n    "
   (description nil
                (""))
   "\n    "
   (syn:updatePeriod nil "hourly")
   "\n    "
   (syn:updateFrequency nil "1")
   "\n    "
   (syn:updateBase nil "1901-01-01T00:00+00:00")
   "\n    "
   (items nil "\n      "
          (rdf:Seq nil "\n        "
                   (rdf:li
                    ((rdf:resource . 
"http://permalink.gmane.org/gmane.discuss/13574";))
                    (""))
                   "\n        "
                   (rdf:li

Then the same thing from the xml.c parser:
                   
(pp nxml (current-buffer))
(RDF
 (text . "\n  ")
 (channel
  (:about . "http://blog.gmane.org/gmane.discuss";)
  (text . "\n    ")
  (title
   (text . "gmane.discuss"))
  (text . "\n    ")
  (link
   (text . "http://blog.gmane.org/gmane.discuss";))
  (text . "\n    ")
  (description)
  (text . "\n    ")
  (updatePeriod
   (text . "hourly"))
  (text . "\n    ")
  (updateFrequency
   (text . "1"))
  (text . "\n    ")
  (updateBase
   (text . "1901-01-01T00:00+00:00"))
  (text . "\n    ")
  (items
   (text . "\n      ")
   (Seq
    (text . "\n        ")
    (li
     (:resource . "http://permalink.gmane.org/gmane.discuss/13574";))
    (text . "\n        ")
    (li

So more work is needed to turn the xml.c parser into something that's
compatible with what xml.el users expect.

Anyway, back to the format thing -- if we disregard the :tag issue
(i.e., find a work-around), then it would be pretty trivial to write a
function to convert the output from libxml-parse-xml-region into what
the xml.el package returns.  (Not to mention the nxml.el package, which
does the same as the xml.el package?)  It'd still be faster than the
pure Elisp version, and Gnus can call libxml-parse-html-region (as
planned) to render HTML as fast and convenient as possible.
    
-- 
(domestic pets only, the antidote for overdose, milk.)
  address@hidden * Lars Magne Ingebrigtsen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]