[Gzz] 14th

gzz-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] 14th

From:	B. Fallenstein
Subject:	[Gzz] 14th
Date:	Sun, 14 Jul 2002 22:11:33 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:0.9.9) Gecko/20020414 Debian/0.9.9-6

Wow-- three things became clear today.


1. Diffs

Thinking about something else, by chance I hit on the solution for ourproblem with generating diffs. The key is to go back to last summer'sscheme of actually diffing two versions of a space, instead of mergingthe changes in an undolist together. Merging the undolist can be great,but it's incompatible with Ted's slice model (as we discussed before--it's not *impossible* to implement the two together, but it taked thesimplicity of either approach away).

The problems with diffing the current against the last saved versionswere a) that it was too complex and b) that it was horribly slow.

Now, b) came from the terribly stupid way we saved vstreams. If weestimate that we have only a few thousant "regular" connections in mostslices, I think we should do fine. And for a), I've found a cure.

Let's assume we are diffing some dimension and we have two maps, 'old'and 'new,' that contain the posward connections along that dimension(i.e., it contains the connections, where the key is the negward and thevalue is the posward cell in the conn). It doesn't matter whether themaps contain String ids or Cell objects-- except that they of coursehave to agree on what they use.


Now, here's the code to generate the diff:

    Set connects = new HashSet(new.entrySet());
    connects.removeAll(old.entrySet());

    Set disconnects = new HashSet(old.entrySet());
    disconnects.removeAll(new.entrySet());

Easy, huh?



2. Extending and embedding (XML)

Today I was thinking about Gzz and XML. I've thought about that before,but today I convinced myself of an idea that's quite radical for ourgroup: not only supporting an XML file format, but making all cellcontents plain XML. This would mean cells would not contain xanalogicalmedia content, but XML structures, most simply storable as strings.

This would allow the zz and xu-media parts of Gzz to be decoupled verynicely, interfacing only in a standards-based way. The xu-media modulewould represent xanalogical text as XML fragments, e.g.:


    <gzz:span block="XXX" start="12" length="17"/>
    <gzz:span block="XXX" start="183" length="8"/>

The client module would then make the zz module store the XML generatedby the xu-media module, and it would use the xu-media module to expandthe XML into actual text.

But we wouldn't be limited to this kind of XML fragment in cells. Thenext step would be formatted text, for which we could simply use XHTML:


    <gzz:span block="XXX" start="12" length="4"/>
    <html:strong>
        <gzz:span block="XXX" start="16" length="3"/>
    </html:strong>
    <gzz:span block="XXX" start="12" length="1"/>

This would be expanded to:

    <gzz:span block="XXX" start="12" length="4">foo </gzz:span>
    <html:strong>
        <gzz:span block="XXX" start="16" length="3">bar</gzz:span>
    </html:strong>
    <gzz:span block="XXX" start="12" length="1">.</gzz:span>

Which, marking bold stuff like *this*, would be rendered as:

    foo *bar*.

(Of course, we can use something else than XHTML if we want to.) Writingthe necessary XML transformation tools seems like a pretty easy task tome, and more so if we can use Jython, and it would make the xu-mediamodule much more useful on its own (you could store web pages with it,and have some server-side script that annotates the pages with links andtransclusions when serving them).

I like this because I'm terribly sick of making schemes for formattedtext in Gzz, and here we could just take a standard and get theinteroperability that comes with it (and all the code that already workswith it) for free. Yet, by having the transformers in the xu-mediamodule, we get XHTML with external markup.

(As a cell's content can be any XML, not just XHTML, this also allows aneasy unified way for putting image and other non-textual spans in cells:just have e.g. <gzz:page-span block="XXX" page="17"> in the cell.)

Storing just the XML on the zz side would make that module's job mucheasier: it wouldn't have to worry about the media model; in fact itcould treat all content as plain strings (something that *all* known zzimplementation so far are able to do), and still be used by our clientwith all the bravado of formatted xanalogical content.

For us, though, that would be only the first step; what I really want isrepresenting the XML in a zz structure. This would make Gzz an XMLbrowser, which is nice to show of Gzz's capabilities; but moreimportantly, it would allow to make arbitrary zz connections to XMLnodes, *enhancing XML with zz connectability*. Of course this becomes aproblem when the XML is edited; how can we keep our connections? Myanswer is to simply put in attributes containing the cell idcorresponding to a node, if any; of course then the editing program hasto keep the attribute, which some might not do, but then it's their problem.

(Of course this is only interesting as long as there are many programsfor handling XML data, but pretty much none for handling zz data. Alas,it's going to stay that way for some time to come.)

ZZ-connecting XML data may be only mildly interesting when you considerXHTML, but the point here is that a cell can contain anything you canrepresent in XML-- for example, MathML formulas. Consider cloning aMathML formula into the XHTML structure representing an article you'rewriting, while having the rootclone in a ZZ structure where you keepyour formulas. Of course we could create our own ZZ structurerepresenting formulas, but this way you could readily view your articlein a standard browser. (And at some point, we can still create our ownZZ structure and have a spacepart that shows it as MathML.)

I called this section "Extending and Embedding," because I believe thatthis can be part of making Gzz usable inside something else. The pointis for Gzz to be able to interoperate with XML formats seamlessly--instead of having converters from XML formats to Gzz structures andback, we could simply organize XML data in Gzz cells. If you have somedata in an app of yours and you'd like to use ZZ to organize it, youshould be able to-- and since most data nowadays can be serialized asXML, being able to put XML in cells would be a good bet here. Especiallybecause other formats than XML would certainly not allow us to put inour cell ids, while XML will (if the app cooperates).

I also want us to provide an XML serialization, so that Gzz data can beput inside an XML document. Indeed, I believe that if the above proposalis accepted, it would make sense to have XML as our native format; givenall the tool support, it should be quite trivial to write scripts thatformat it nicely (and that we can pipe them files into). On the otherhand, we could of course use an alternative serialization of the samedata ourselves, one that uses some more readable and/or lessspace-consuming format.

Bah, far too much rambling and not enough time to make it shorter.Anyways. The idea is recorded, and we can discuss it here or on IRC.

One last thing: It seems all very easy to implement to me. If it doesn'tto you, it may be that the explanation is just confusing. Pleasecomment, and I'll try to re-explain if I didn't make myself clear.




3. Mediaserver indexing

Okay, I'll make this one short. The connection between xu transclusionsare implicit. The connection between a xu link and a document thatoverlaps with one of that link's endsets is also implicit. To resolveboth links and transclusions, the lookup we need is "which blocks doreference span X?". Then we can load those blocks, see whether they'relinks or documents, and take appropriate action.

Now, we know that once we go p2p, this will be the *difficult* lookup(because it's something current p2p systems don't do). However, not onlyon a local system but also on webpages it should be relatively doable.

Currently, in a mediaserver pool we have b_xxx files that store blocksand p_xxx files that store information about pointers. Now, we cansimply have i_xxx files that list all blocks which reference spans inblock xxx ('i' for 'index'). However, the blocks listed in the i_xxxfiles would be only blocks which are in that same pool.

This works fine for local pools, and it also works for pools on awebserver, as long as all links and transclusions we want to follow arebetween documents on that same webserver. If that's not enough we couldaugment the above scheme with a facility for a webserver to tell usabout other webservers. We could have j_xxx files (for 'jump'), whichwould list URLs of pools that have blocks with references to spans fromblock xxx-- then the client could go to these pools and look in theirrespective i_xxx files.

The data obtained that way wouldn't be comprehensive, but this schemewould already be Web+ -- i.e., be as good as the web and even better(you can point to arbitrary other places plus you have *some* implicitlinking). If we could implement this, it would be a really good startingpoint for the real p2p search later.


- Benja

[Prev in Thread]

Current Thread

[Next in Thread]

[Gzz] 14th, B. Fallenstein <=
- Re: [Gzz] 14th, Tuomas Lukka, 2002/07/15
  - Re: [Gzz] 14th, B. Fallenstein, 2002/07/15
- Re: [Gzz] 14th, Tuomas Lukka, 2002/07/15
  - Re: [Gzz] 14th (xml), B. Fallenstein, 2002/07/15

Prev by Date: [Gzz] Back from conference
Next by Date: Re: [Gzz] 14th
Previous by thread: [Gzz] Back from conference
Next by thread: Re: [Gzz] 14th
Index(es):
- Date
- Thread