gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] Re: the Storm article


From: Benja Fallenstein
Subject: Re: [Gzz] Re: the Storm article
Date: Fri, 07 Mar 2003 23:57:10 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9

Eric Armstrong wrote:
Benja Fallenstein wrote:

Missing Ingredients
-------------------
* How are collisions handled?
  (Surely some small blocks must produce the
   same cryptographic hash as other small blocks,
   sometimes.)

a) It's extremely unlikely (AFAIK you'd need about 2^80 blocks in a
single lookup system to find a collision by chance).


Hmm. This makes me realize that I really don't understand what a
"cryptographic hash" is. Maybe a couple of words to the effect
that it's a non-colliding hash, couple with a reference (which is
probably already there) would suffice.

Cryptographic means that it's a one-way, collision-free hash function. One-way means that given a hash h, it is (computationally) infeasible to x so that h = H(x). *Weakly* collision-free means that given a message x, it is infeasible to find a message y so that H(x) = H(y). *Strongly* collision-free means that it is infeasible to find any messages x and y so that H(x) = H(y).

Of course you need to see the public p2p repository of Storm blocks as a (brute force) collision searcher for this definition to be relevant :-)

SHA-1, which we use, is regarded to be strongly collision-free.

My fave online ref is: http://www.rsasecurity.com/rsalabs/faq/2-1-6.html

Versions of docs are blocks and blocks are hashed by content. Does that
answer the question?

Probably. I'm just not sure if a doc is essentially a list of blocks,
or text that contains links to blocks.

Not sure what *you* mean, so I'll give it another try in my words :-)

A *document* is something identified by a pointer; it has current and obsoleted versions.

A *version of a document* is a Storm block. A block B is made a version of a document by a pointer block P targetting B. I.e., the pointer block says, "This block is a version of this document."

What the block B contains is format-specific. B could have the content type text/html; in that case it would be simply an HTML file. (Duh.)

If we want xanalogical storage, we need a special file format for that. Let's make one up for the purpose of this discussion:

  <xu-text>
    <span block="urn:x-storm:0100731AB968B4375F6B0BA430088805B47649A3CC"
          from="5" to="32"/>
    <span block="urn:x-storm:0100731AB968B4375F6B0BA430088805B47649A3CC"
          from="43" to="54"/>
    <span block="urn:x-storm:0100731AB968B4375F6B0BA430088805B47649A3CC"
          from="77" to="90"/>
  </xu-text>

Block 0100731AB968B4375F6B0BA430088805B47649A3CC is a plain text block. (The gibberish is an SHA-1 hash in hex.) Our document contains three ranges of text taken from this block; the first goes from the 5th to the 32th character in the block, and so on. On the screen, the three ranges will be shown consecutively-- the user won't be able to see where one ends and the other starts.

(In Ted Nelson's words, this is a transclusion.)

And I was curious as to how
the name of the link (the identifier -- a cryptographic hash in its
own right) figures into the hash of the doc. Not that it really matters.
I've just never seen hashing applied to a tree structure before, and
I'm mildly curious as to how it works. (Not deeply curious, though.)

It's not a big thing, but has one relevant property: You cannot have circular references. If block A refers to block B (through B's hash), we know that A was created after B, and B cannot refer to A (at least without the hash function being broken).

All this makes me think we should give "Xanadu" a section in "Related
Work," and then later explain how we explain xanalogical storage in
Storm, in a different way than Project Xanadu did.

That sounds absolutely right.

Good.

* "caching becomes trivial, because it is never necessary to
 to check for new versions of blocks". Hmm. This sounds like
 versioning isn't supported, which seems like a weakness.

I know that telling I reviewer "but we said this" is a no-no since if
you have to say so, you apparently didn't say it well enough :) but in
this case I must ask: The first paragraph of that section ends with,
"Mutable data structures are built on top of the immutable blocks (see
Section 6)." Any ideas on how to make explicit that we'll get to
versioning later on?

Aha. You know, having been educated in the last century (way deep in
the last century, at that) I had no idea that "mutable data structures"
had anything to do with versioning.

:-)

Well, immutable == cannot change, and mutable == can change. Blocks cannot change, they always contain exactly the same bytes. Documents *can* change, yet we implement the changeable documents in terms of blocks.

I think I'm missing the background
picture you're working against, so a little bit of the "back story"
would clue me in. Maybe something along these lines:
  * In a normal caching system, when a new version is created...
  * That means the caching system has to ...
  * But in Storm, ...

In a normal caching system, when a new version is created, the cached copy becomes invalid. However, the cache has no way to notice this except fetching a completely new version from the server.

That means that the caching system has to check for new versions in regular intervals, invalidating the cached copy. Even so, it can happen that the cache is not up-to-date.

But in Storm, since blocks cannot change, cached copies of blocks remain valid indefinitely.

The same is not true for the changeable documents. However, if you always want to access the newest version of a document, you can always resolve the *pointer* when accessing the document. If the pointer still points to the same version as last time, you needn't download that version again; you still have it in your cache.

I suspect that a short discussion which followed that pattern would
bring me up to speed. (I think the point is that links to all versions
are "direct" in Storm, so when a new version of a block is written,

Note: There aren't 'versions of blocks.' We can talk about versions of documents, and each such version is one block...

the document referring to it automatically points to that version. But
I'm not clear how that affects another document that might be pointing
to the same block. Apparently it does not reference the new version?

If the other document wants an updatable reference, it must refer to the *pointer*, not to the block. By dereferencing the pointer, the current version of the refered-to document can then be found.

You would refer to a block if you want to talk about one specific version of a document.

Or maybe the pointer block assures that it will, unless some specific
reference to a particular version is made.

Guess that's what you're saying here :)

* A block is hashed. Ok. And a doc contains pointers to blocks.
 Ok. But is a doc a block? How is it hashed? How do links
 contribute to the hash?

Each version of a doc is a block... Links: Depends on how you make them
(i.e., the format of the document): If they are inline, as in HTML, they
contribute to the hash. If they are external-- anybody can contribute
links by putting them in another block-- they do not.

Oooh. That's an interesting feature. I'm looking forward to hearing
more about that.
In both XLink and Xanadu, links can be both inside a document (which
gives them additional credibility... e.g. the user should be able to
select 'view only links contained in the document') or they can be external.

Hmmm. I'm familiar with XLink, but not with that feature. I guess I
need more "back story" on this topic.

Now, I'm not familiar with XLink :-) but if memory serves, 'an' XLink is an arc from one (set of) URI(s) to another, and this arc can be in either or neither of the linked documents. IIRC you can implement 'linkbases' using XLink which provide links externally to the documents they connect. The idea is that your browser connects to such a linkbase and retrieves from it links from/to the documents you access (as in the Microcosm/DLS refs from Related Work).

A pointer block is obsolete if it is on the 'obsoleted' list of any of
the other pointer blocks.

Oh. There's an obsoleted list. Maybe that was in the block-pointer
diagram, and I missed it.

Yup, and also in the main text: "In addition to the pointer and the target, pointer blocks contain a list of zero or more *obsoleted* pointer blocks."

The whole of section 6 (Versioning) felt like the hardest-to-explain part of the article, btw. This may be a symptom of it...

Excellent point! The example was really good, too. I've changed
my mind. The way you're doing it is right.

:-)

(If there is room for
the example to explain why, awesome!)

I edited a mention of that out as 'important-but-not-that-important' to meet the 10 page limit... :)

You mean storage media? Yes. Hey, I actually think we should make the
point here that when you copy an image to another document, or keep
differently edited versions of a movie, Storm stores the content only
once-- and can thus *save* disk space :-)

Wild! Really good point.

It was the original motivation for Storm, btw-- before anybody thought about using it for versioning or reliability :)

Most welcome, and thank you for sharing this great read with me.
I look forward to "Taking the World by Storm".
:_)

And I'm looking forward to that article! ;)

I'm currently working on the RDF editor that will form the 'core' of Fenfire (called Fenfire Loom), but hope to be able to separate Storm into its own project soon. Until then, if you want to take a closer look, you can find the interfaces/implementation at:

    http://cvs.gnu.org/cgi-bin/viewcvs.cgi/gzz/gzz/lava/gzz/storm/

The javadoc should be acceptable, but please do ask questions and feel free to point out ommissions.

The tests are at:

    http://cvs.gnu.org/cgi-bin/viewcvs.cgi/gzz/gzz/lava/test/gzz/storm/

Much of that is in Jython, so this may be a harder read for you.

- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]