gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] 14th


From: B. Fallenstein
Subject: [Gzz] 14th
Date: Sun, 14 Jul 2002 22:11:33 +0200
User-agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:0.9.9) Gecko/20020414 Debian/0.9.9-6

Wow-- three things became clear today.


1. Diffs

Thinking about something else, by chance I hit on the solution for our problem with generating diffs. The key is to go back to last summer's scheme of actually diffing two versions of a space, instead of merging the changes in an undolist together. Merging the undolist can be great, but it's incompatible with Ted's slice model (as we discussed before-- it's not *impossible* to implement the two together, but it taked the simplicity of either approach away).

The problems with diffing the current against the last saved versions were a) that it was too complex and b) that it was horribly slow.

Now, b) came from the terribly stupid way we saved vstreams. If we estimate that we have only a few thousant "regular" connections in most slices, I think we should do fine. And for a), I've found a cure.

Let's assume we are diffing some dimension and we have two maps, 'old' and 'new,' that contain the posward connections along that dimension (i.e., it contains the connections, where the key is the negward and the value is the posward cell in the conn). It doesn't matter whether the maps contain String ids or Cell objects-- except that they of course have to agree on what they use.

Now, here's the code to generate the diff:

    Set connects = new HashSet(new.entrySet());
    connects.removeAll(old.entrySet());

    Set disconnects = new HashSet(old.entrySet());
    disconnects.removeAll(new.entrySet());

Easy, huh?



2. Extending and embedding (XML)

Today I was thinking about Gzz and XML. I've thought about that before, but today I convinced myself of an idea that's quite radical for our group: not only supporting an XML file format, but making all cell contents plain XML. This would mean cells would not contain xanalogical media content, but XML structures, most simply storable as strings.

This would allow the zz and xu-media parts of Gzz to be decoupled very nicely, interfacing only in a standards-based way. The xu-media module would represent xanalogical text as XML fragments, e.g.:

    <gzz:span block="XXX" start="12" length="17"/>
    <gzz:span block="XXX" start="183" length="8"/>

The client module would then make the zz module store the XML generated by the xu-media module, and it would use the xu-media module to expand the XML into actual text.

But we wouldn't be limited to this kind of XML fragment in cells. The next step would be formatted text, for which we could simply use XHTML:

    <gzz:span block="XXX" start="12" length="4"/>
    <html:strong>
        <gzz:span block="XXX" start="16" length="3"/>
    </html:strong>
    <gzz:span block="XXX" start="12" length="1"/>

This would be expanded to:

    <gzz:span block="XXX" start="12" length="4">foo </gzz:span>
    <html:strong>
        <gzz:span block="XXX" start="16" length="3">bar</gzz:span>
    </html:strong>
    <gzz:span block="XXX" start="12" length="1">.</gzz:span>

Which, marking bold stuff like *this*, would be rendered as:

    foo *bar*.

(Of course, we can use something else than XHTML if we want to.) Writing the necessary XML transformation tools seems like a pretty easy task to me, and more so if we can use Jython, and it would make the xu-media module much more useful on its own (you could store web pages with it, and have some server-side script that annotates the pages with links and transclusions when serving them).

I like this because I'm terribly sick of making schemes for formatted text in Gzz, and here we could just take a standard and get the interoperability that comes with it (and all the code that already works with it) for free. Yet, by having the transformers in the xu-media module, we get XHTML with external markup.

(As a cell's content can be any XML, not just XHTML, this also allows an easy unified way for putting image and other non-textual spans in cells: just have e.g. <gzz:page-span block="XXX" page="17"> in the cell.)

Storing just the XML on the zz side would make that module's job much easier: it wouldn't have to worry about the media model; in fact it could treat all content as plain strings (something that *all* known zz implementation so far are able to do), and still be used by our client with all the bravado of formatted xanalogical content.

For us, though, that would be only the first step; what I really want is representing the XML in a zz structure. This would make Gzz an XML browser, which is nice to show of Gzz's capabilities; but more importantly, it would allow to make arbitrary zz connections to XML nodes, *enhancing XML with zz connectability*. Of course this becomes a problem when the XML is edited; how can we keep our connections? My answer is to simply put in attributes containing the cell id corresponding to a node, if any; of course then the editing program has to keep the attribute, which some might not do, but then it's their problem.

(Of course this is only interesting as long as there are many programs for handling XML data, but pretty much none for handling zz data. Alas, it's going to stay that way for some time to come.)

ZZ-connecting XML data may be only mildly interesting when you consider XHTML, but the point here is that a cell can contain anything you can represent in XML-- for example, MathML formulas. Consider cloning a MathML formula into the XHTML structure representing an article you're writing, while having the rootclone in a ZZ structure where you keep your formulas. Of course we could create our own ZZ structure representing formulas, but this way you could readily view your article in a standard browser. (And at some point, we can still create our own ZZ structure and have a spacepart that shows it as MathML.)

I called this section "Extending and Embedding," because I believe that this can be part of making Gzz usable inside something else. The point is for Gzz to be able to interoperate with XML formats seamlessly-- instead of having converters from XML formats to Gzz structures and back, we could simply organize XML data in Gzz cells. If you have some data in an app of yours and you'd like to use ZZ to organize it, you should be able to-- and since most data nowadays can be serialized as XML, being able to put XML in cells would be a good bet here. Especially because other formats than XML would certainly not allow us to put in our cell ids, while XML will (if the app cooperates).

I also want us to provide an XML serialization, so that Gzz data can be put inside an XML document. Indeed, I believe that if the above proposal is accepted, it would make sense to have XML as our native format; given all the tool support, it should be quite trivial to write scripts that format it nicely (and that we can pipe them files into). On the other hand, we could of course use an alternative serialization of the same data ourselves, one that uses some more readable and/or less space-consuming format.

Bah, far too much rambling and not enough time to make it shorter. Anyways. The idea is recorded, and we can discuss it here or on IRC.

One last thing: It seems all very easy to implement to me. If it doesn't to you, it may be that the explanation is just confusing. Please comment, and I'll try to re-explain if I didn't make myself clear.



3. Mediaserver indexing

Okay, I'll make this one short. The connection between xu transclusions are implicit. The connection between a xu link and a document that overlaps with one of that link's endsets is also implicit. To resolve both links and transclusions, the lookup we need is "which blocks do reference span X?". Then we can load those blocks, see whether they're links or documents, and take appropriate action.

Now, we know that once we go p2p, this will be the *difficult* lookup (because it's something current p2p systems don't do). However, not only on a local system but also on webpages it should be relatively doable.

Currently, in a mediaserver pool we have b_xxx files that store blocks and p_xxx files that store information about pointers. Now, we can simply have i_xxx files that list all blocks which reference spans in block xxx ('i' for 'index'). However, the blocks listed in the i_xxx files would be only blocks which are in that same pool.

This works fine for local pools, and it also works for pools on a webserver, as long as all links and transclusions we want to follow are between documents on that same webserver. If that's not enough we could augment the above scheme with a facility for a webserver to tell us about other webservers. We could have j_xxx files (for 'jump'), which would list URLs of pools that have blocks with references to spans from block xxx-- then the client could go to these pools and look in their respective i_xxx files.

The data obtained that way wouldn't be comprehensive, but this scheme would already be Web+ -- i.e., be as good as the web and even better (you can point to arbitrary other places plus you have *some* implicit linking). If we could implement this, it would be a really good starting point for the real p2p search later.

- Benja




reply via email to

[Prev in Thread] Current Thread [Next in Thread]