gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] Re: the Storm article


From: Benja Fallenstein
Subject: Re: [Gzz] Re: the Storm article
Date: Fri, 07 Mar 2003 18:08:38 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9

Alatalo Toni wrote:
On Thu, 6 Mar 2003, Eric Armstrong wrote:
First, I liked the article. A lot. I'm looking
forward to using Storm, and so is Eugene (only
took a 30 second presentation to get him
interested).

glad to hear, and thank you very much about the detailed comments!

Me too :)

I even started a review article "Taking the World
by Storm", for publication in some broad-interest
journal. (Not sure which one, though.)

Cool!

Next, specifics.

Thanks for the comments. I'll also answer questions here, so that you don't need to wait for the next version of the article to get them answered :)

Missing Ingredients
-------------------
These are things that need to be addressed in the
article, however briefly, but are not currently
mentioned:
 * How are collisions handled?
   (Surely some small blocks must produce the
    same cryptographic hash as other small blocks,
    sometimes.)

afaik it should not happen, but it is theoretically possible. so i guess
it's a good question :)

We assume that it doesn't happen.

a) It's extremely unlikely (AFAIK you'd need about 2^80 blocks in a single lookup system to find a collision by chance).

b) A supercomputer or distributed.net effort dedicated to find a collision by brute force seems to be more likely to find a collision, because it would dedicate all its computation time to this, instead of only producing a few blocks per hour(?) and system.

But once a supercomputer or distributed.net effort able to find a collision by brute force becomes feasible, the hash function isn't secure any more anyway.

That they aren't feasible yet, I trust the experts to evaluate (i.e. the cryptographers, e.g. those designing these hash functions).

 * How are docs hashed? I didn't see a discussion of that.

Versions of docs are blocks and blocks are hashed by content. Does that answer the question?

 * What is the project storage impact?
   (Maybe only "publish" material goes into the system,
    or maybe storage is cheap and growing cheaper so
    we don't really care, but it needs to be mentioned.)

Again not sure what you mean, sorry. How would it be different from classical file system storage? (Except that we need to store media files like images only once even when they're used in different documents on the same computer...)

If you refer to storing past versions, I understand. This is a general problem with versioned storage. We use the diff scheme to limit the storage needed there. Also we allow deleting past versions :-)

 * What language is it written in?
   (Or do I care? If it really is like a "file system",
    maybe I really don't?)

mostly Java, otherwise ex-gzz (now Fenfire) has been written also in
Python (Jython, for tests, demos and clients at least) and C++ (opengl
graphics API) .. but all Storm code I've seen is Java.

Currently Storm is in Java. If others want to adopt it, we'd welcome the contribution of implementations in other languages, obviously.

 * If there really is a "file system" gui, that's still
   going to be different from a shell, because I won't
   be able to launch any existing editors, will I? They'll
   need to write new files, not rewrite old ones -- and
   they'll need to understand blocks and transclusions.

You could also implement something like CVS on top of Storm: 'check out' files into a normal tree, edit them there, 'commit' into a 'repository' built from Storm blocks. Other than that, yeah.

(I also wouldn't want to be limited to hierarchy...)

 * Short description of "Structured overlay networks".
   What they do, what they accomplish. (paragraph or two)

they are a type of peer-to-peer networks, overlay refers to how e.g.
gnutella and freenet and layed over the Internet.

Hermanni, do you have a good reference?

 * Short description of gzz and it's relationship to Storm

this all must be updated to the current Fenfire status

Yep.

Sequenced Comments
------------------
Thoughts and questions that occurred to me as I read.

Abstract
* Very cool. location-independent globally unique identifiers,
  append-and-delete only storage, and peer-to-peer networking.
  very, very cool.

:-)

* The two major issues addressed are mentioned here: dangling
  (unresolved) links and keeping track of alternative versions.
  These deserve to be mentioned in the abstract.

Right...

Related Work
* It's not totally clear what the relationship of the related
  work is to the current project. Do the systems described
  represent old work you've moved beyond, old work that
  provided useful lessons (what lessons?), a foundation for
  the current work (what parts?), predecessors or clients of
  the current work.

Good point. The hypertext part represents old work we've moved beyond, benefitting from p2p research-- making the assumption that location-independent identifiers cannot be resolved globally, they have to use complicated/limited schemes to guarantee backlink integrity.

The p2p part is what we build upon-- we'll use distributed hashtables to implement Storm block lookup on the public internet.

The p2p hypermedia part is similar work-- not really alternative, or superceded by us, or anything, just somewhat similar ideas.

* Mention gzz here, and it's relationship to Storm (i.e. gzz
  refactored to create Storm as an independent module.)

Ok.

Peer-to-Peer Systems
* Mentions a proposal for a common API usable by DHT systems,
  but it's not clear if you plan to build on that, or if it
  is a rival, or a predecessor.

We hope to build on it, when implementations become available for Java.

* Hmmm. Probabilistic access seems reasonable for "contact"
  scenarios (bunch of people together at a meeting), but not
  for "publishing" scenarios (publish document on the web).
  May be worth drawing the distinction here.

Yep.

Overview of Xanalogical Storage
* This threw me. A minute ago we were talking about blocks,
  now we're talking about characters. Needs a transition to
  make the relationship apparent. (Later, you talk about
  spans. Those may be precursors to blocks or they really are
  blocks. I'm not sure which. Need to anticipate that thought
  somehow, and tell how we're building up to it, if that's
  what's going on

* Yeah. There's the paragraphs on spans. That threw me, too.
  Suddenly I had gone from blocks to characters and now to
  spans, and I was pretty confused about how they related.

* "Our current implementation" has me wondering what we're
  talking about. At this point, I thought this more "Related
  work", like "peer to peer systems". But now it seems it's
  all one system? Or was this a previous system, before you
  started working on Storm? (Need to make the relationships
  apparent.)

All this makes me think we should give "Xanadu" a section in "Related Work," and then later explain how we explain xanalogical storage in Storm, in a different way than Project Xanadu did.

Storm Block Storage
* Now were back to blocks. Why did that last section exist,
  anyway? (make the relationship apparent)

:-)

* "caching becomes trivial, because it is never necessary to
  to check for new versions of blocks". Hmm. This sounds like
  versioning isn't supported, which seems like a weakness.

I know that telling I reviewer "but we said this" is a no-no since if you have to say so, you apparently didn't say it well enough :) but in this case I must ask: The first paragraph of that section ends with, "Mutable data structures are built on top of the immutable blocks (see Section 6)." Any ideas on how to make explicit that we'll get to versioning later on?

(Talking about versioning first wouldn't work since we can only explain our approach there having explained block storage, first...)

* Interesting. There is a need for "anonymous caching". That
  allows replication, while resolving the privacy concern.

Yep.

* A block is hashed. Ok. And a doc contains pointers to blocks.
  Ok. But is a doc a block? How is it hashed? How do links
  contribute to the hash?

Each version of a doc is a block... Links: Depends on how you make them (i.e., the format of the document): If they are inline, as in HTML, they contribute to the hash. If they are external-- anybody can contribute links by putting them in another block-- they do not.

In both XLink and Xanadu, links can be both inside a document (which gives them additional credibility... e.g. the user should be able to select 'view only links contained in the document') or they can be external.

* Gzz is first mentioned here. It needs to be described earlier
  in the Xanalogical addressing section.

Probably we should move the xu section after the block storage section, actually... reducing the back-and-forth.

* "Storm was first developed for the Gzz application, a platform
  explicitly developed to overcome the limitations of traditional
  file-based applications" -- a *very* intriguing statement.
  When Gzz is introduced, this statement needs to be expanded to
  provide a short list of those limitations, and what Gzz did to
  solve them. (It has to be very short, of course -- no mean feat.)

Challenging. :-) But you're right.

* "UI conventions for listing, moving, and deleting blocks"
  I don't know. That sounds wrong to me. Blocks should be
  under the covers, and I should be dealing with docs. Ok,
  so I have an outline-browser (for example, ADM's) or a
  similar editor. Internally, blocks are moved around when I
  edit. But my access is always thru a "Doc" -- otherwise I'll
  be looking at blocks that are divorced from any context whatever.

You're right.

Application-Specific Reverse-Indexing
* This lost me pretty quickly. I wasn't sure what the purpose
  of this section was. I needed a use case or two to keep me
  oriented. Later, it becomes clear that this is
  a part of the versioning solution. Mention that fact here.
  If possible, also give one or more examples of the other
  indexing systems you created, to show what this section is
  for.

Maybe this section should switch places with the versioning one.

* keyword searching
  --it seemed to me that a keyword index would return every
    *version* of a block that contained the word, which would
    be a real weakness.
  --(maybe versioning needs to be described first, so you can
     discuss the indexing process in context, and mention the
     resolutions for such issues?)

Ok, another reason. :-)

BTW, my take would be that the indexing would indeed return every version of a document ('version of a block' doesn't exist since blocks are immutable :) ). The UI would then sort out which versions are 'current' and show only those. This would also allow searching in past versions, when desired.

Versioning
* Aha! I read the paper over several days, and so much water
  went under the dam that I had forgotten this was mentioned
  at the beginning of the paper.

* "if, on the other hand, two people collaborate..."
  VERY nice. Multiple "current version"s are allowed to exist.
  That's the only possible way to handle the situation.

* Note 6:
  It wasn't clear to me how it knows which pointer blocks are
  obsolete.

A pointer block is obsolete if it is on the 'obsoleted' list of any of the other pointer blocks.

* Beautiful statement of points for further research
  (authenticating pointer blocks, UI for choosing alternative
   versions, suitability for web-like publishing). But the
   system looks strong enough to make me *want* to do such
   experimentation

Great!

Diffs
* It wasn't clear if the most recent version was "intact" and
  previous versions were stored as diffs. I would hope so,
  in general. At least, if there was only one option, that's
  the one I'd want. Or can you do it either way?

You can do it either way (unforch, we do not have an implementation keeping the most recent version, yet). You would store the same diffs in both cases, I think, keeping the most recent version only as a kind of cache.

Not keeping the most recent version has reliability benefits: If your chain of diffs is broken, you notice when you try to load the current version (and you've only lost the current version, since you haven't changed the version before that). If you keep the most recent version, there may be a problem with one diff you do not notice because you never look at the diffs, you just load the current version. Now imagine you save your data again, the old "intact" version is deleted, but creating the new "intact" version goes awry. You have now lost all data from the broken diff till the broken full version.

Of course, our code will check whether it can reconstruct the full version before trusting a diff... but if the requirement for reliability is especially high, you may want to take the safer route of not storing an "intact" version.

* Yes. This is the point of the article. Dangling links and
  version handling. Definitely belongs in the abstract.

Yep.

* Impact of immutable blocks on media use needs a mention
  here. (Maybe just hand-waving, but some mention of the
  fact that it's going to cost disk space, in return for
  improved ability to do xyz, is needed.)

You mean storage media? Yes. Hey, I actually think we should make the point here that when you copy an image to another document, or keep differently edited versions of a movie, Storm stores the content only once-- and can thus *save* disk space :-)

Conclusions
* Wild. A Zope based on Storm. Or an OHP.
  --what's an OHP, anyway. (needs a one-line definition)
  --come to think of it, I recognize Zope, but not everyone
    will. That needs a one-line explanation, as well.

Right.

* "structured overlay networks such as DHTs"
  --I need another paper describing these things, so I can
    find what they heck they are and how they work!

We really need a good reference about this. Hermanni?

Bottom Line
-----------
An excellent read, and a most promising technology.
Thanks for sending it to me.

Thank you very much for your comments.

- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]