Re: [Qexo-general] Re: Kava + Xerces

qexo-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qexo-general] Re: Kava + Xerces

From:	Per Bothner
Subject:	Re: [Qexo-general] Re: Kava + Xerces
Date:	Tue, 24 Feb 2004 14:37:57 -0800
User-agent:	Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.6) Gecko/20040113

Joseph Coffland wrote:

But an implementation of "validation"


This feature is not as easy as it seems.  The easy way is inefficient.
That is to dump the XML as text and reparse it.  I think I can get
Xerces to validate your TreeList format directly by creating my own
implementation of XMLDocumentScanner.  I have not tested this idea
yet.  Revalidation is an often asked feature, but not implemented
directly in Xerces.  If you build DOM 2.0 or greater
it can be automatically validated, but DOM carries a lot of overhead.


Right.  The diagrams at http://xml.apache.org/xerces2-j/xni-design.html
suggest that validation "post-scanning" is supported, without being
very specific about how to do it.

the "scheme import feature",


What is this?


I meant "schema import feature".  I.e. having the Qexo compiler
read Schema definitions, and have Schema type information incorporated
into the compile-time (static) typing of XQuery expression.  It's needed
for useful static typing, but it's obviously not a small task.

type annotation *and* general XQuery run-time type checking is
probably worth 5% of XQuery/XML-related licensing deals.  Completing
static typing in addition is definitely worth 5%, though presumably
you'd need a lot of my input/feedback.


I think XQParser needs a clean up first.


To me that is much lower on the priority list.

A > 3000 line Java class is pretty crazy IMO.


A 3000 line source file is not particularly large.  Many Gcc source
files are over 10000 lines - not that that's a desirable state.  But
I'm not sure that 10 350 lines classes are better than one 3000 line
class.  The former use more run-time resources, though not by much.

At the least I think
the Lexer and Parser should be separated.


That may be a reasonable start.  I'd probably accept such a patch.
I might even do it myself when I'm the right mood ...

Basically, we need to implement the Schema datatype system.  I have
looked around.


It seems like Xerces2-J has a representation of Schema types.  What
Qexo could do is to have some wrapper classes around the Xerces
classes that can plug in instead of gnu.kawa.xml.NodeType and
related classes.  But I'd have to look at this more closely.

There has been talk about making it possible to update documents in
XQuery.


I see 3 different things this can mean:
(1) Easier ways to modify a document and create a modified document.
XSLT is one approach.  This of course is purely functional and not
a real "update", but it can serve many of the needs of update.
(2) Modify a node tree (DOM) in place.  In Qexo, that would be
modifying a NodeTree.
(3) Modify an XML file or an XML dataset stored in a database, in place.
This presumably means not invalidating existing document/node IDs,
which may be difficult.  This goal is very much database-driven.

I don't want to wait until they figure out how to standardize this.  I'm
thinking of writing some functions to do this.  Have you heard of
XUpdate.
http://www.xmldb.org/xupdate/xupdate-wd.html
I wrote an implementation of this in Java a few years ago.
What I would like to do is translate XUpdate's syntax into XQuery
functions.
XUpdate has the following modification commands:

xupdate:insert-before
xupdate:insert-after
xupdate:append
xupdate:update
xupdate:remove
xupdate:rename

These would not be difficult to implement and in my experience they aresufficient. Maybe we can even influence the XQuery standard.


It's not that easy.  First of course you're changing the fact that
XQuery is a pure side-effect-free language.  Second you have to be
careful about node identity.  Remember that in:
  let $x := <a/> return (<b>{$x}</b>, $x)
the <a/> is *copied* - the <a/> child of <b> is not the same node
as $x.  If you're going to allow operations that modify a node you
have to be careful to define what nodes are modified and how.

Finally, that's just solving (2) - modifying nodes in-place.
That is a minor convenience, but doesn't really let you do anything
you can't do otherwise.  To do that you have to work on (3), which
brings in lots of messy issues with transactions plus it's hard to
do it in a way that's not tied to a specify database.  If you define
a standard XML<->relation mapping you can probably do something with
JDBC, but that's a real change in Qexo's current focus.

Obviously there is a lot of demand for "update" functionality in
XQuery, but this demand is I think primarily for people wanting
"XML databases", which is not I think where Qexo's current strengths
lie.  I don't want to discourage you, and I'd love to see experiements
with updates, but "production-strength" updating is a hard problem.
But maybe we can make something useful even if updates are not initially
"production-strength."
--
        --Per Bothner
address@hidden   http://per.bothner.com/

[Prev in Thread]

Current Thread

[Next in Thread]

[Qexo-general] Re: Kava + Xerces, Joseph Coffland, 2004/02/24
- Re: [Qexo-general] Re: Kava + Xerces, Per Bothner <=
  - Re: [Qexo-general] Re: Kava + Xerces, Joseph Coffland, 2004/02/24

Prev by Date: [Qexo-general] TreeList implementation
Next by Date: [Qexo-general] Re: TreeList implementation
Previous by thread: [Qexo-general] Re: Kava + Xerces
Next by thread: Re: [Qexo-general] Re: Kava + Xerces
Index(es):
- Date
- Thread