gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] Re: the Storm article


From: Eric Armstrong
Subject: Re: [Gzz] Re: the Storm article
Date: Fri, 07 Mar 2003 13:38:25 -0800

Thanks for the clarifications, Benja. I'll respond
to the "interesting bits" below (especially the parts
where a posed a poorly-phrased question!).

Benja Fallenstein wrote:
> 
> >>Missing Ingredients
> >>-------------------
> >>  * How are collisions handled?
> >>    (Surely some small blocks must produce the
> >>     same cryptographic hash as other small blocks,
> >>     sometimes.)
> >
> a) It's extremely unlikely (AFAIK you'd need about 2^80 blocks in a
> single lookup system to find a collision by chance).
> 
Hmm. This makes me realize that I really don't understand what a
"cryptographic hash" is. Maybe a couple of words to the effect
that it's a non-colliding hash, couple with a reference (which is
probably already there) would suffice.

> Versions of docs are blocks and blocks are hashed by content. Does that
> answer the question?
>
Probably. I'm just not sure if a doc is essentially a list of blocks,
or text that contains links to blocks. And I was curious as to how
the name of the link (the identifier -- a cryptographic hash in its
own right) figures into the hash of the doc. Not that it really matters.
I've just never seen hashing applied to a tree structure before, and
I'm mildly curious as to how it works. (Not deeply curious, though.)
 
> >>  * What is the project storage impact?
> >>    (Maybe only "publish" material goes into the system,
> >>     or maybe storage is cheap and growing cheaper so
> >>     we don't really care, but it needs to be mentioned.)
> 
> Again not sure what you mean, sorry. 
>
Talk about your classic typo. I meant "system storage" impact.
I absolutely agree that a write-only system is the only way to
guarantee structural integrity. At the same time, I've always
been afraid of how much storage it might require, and how that
issue (if it is one) can be addressed.

> If you refer to storing past versions, I understand. This is a general
> problem with versioned storage. We use the diff scheme to limit the
> storage needed there. Also we allow deleting past versions :-)
> 
There you go. That's the answer the I was looking for. 
Diffing and deleting as an answer to storage-growth makes sense.

> > .. but all Storm code I've seen is Java.
>
Can't tell you how personally delighted I am to hear that. 
(And I do mean personally. It will make it easier to understand
and play with it.)

> You could also implement something like CVS on top of Storm: 'check out'
> files into a normal tree, edit them there, 'commit' into a 'repository'
> built from Storm blocks. Other than that, yeah.
>
Cool. Just mentioning that will probably lead to some getting
excited enough to do it...
 
> All this makes me think we should give "Xanadu" a section in "Related
> Work," and then later explain how we explain xanalogical storage in
> Storm, in a different way than Project Xanadu did.
>
That sounds absolutely right.

> >> * "caching becomes trivial, because it is never necessary to
> >>   to check for new versions of blocks". Hmm. This sounds like
> >>   versioning isn't supported, which seems like a weakness.
> 
> I know that telling I reviewer "but we said this" is a no-no since if
> you have to say so, you apparently didn't say it well enough :) but in
> this case I must ask: The first paragraph of that section ends with,
> "Mutable data structures are built on top of the immutable blocks (see
> Section 6)." Any ideas on how to make explicit that we'll get to
> versioning later on?
>
Aha. You know, having been educated in the last century (way deep in
the last century, at that) I had no idea that "mutable data structures"
had anything to do with versioning. I think I'm missing the background
picture you're working against, so a little bit of the "back story"
would clue me in. Maybe something along these lines:
  * In a normal caching system, when a new version is created...
  * That means the caching system has to ...
  * But in Storm, ...

I suspect that a short discussion which followed that pattern would
bring me up to speed. (I think the point is that links to all versions
are "direct" in Storm, so when a new version of a block is written,
the document referring to it automatically points to that version. But
I'm not clear how that affects another document that might be pointing
to the same block. Apparently it does not reference the new version?
Or maybe the pointer block assures that it will, unless some specific
reference to a particular version is made.

(Perhaps by analyzing the delusions I am laboring under in the 
paragraph above, you will see ways to preventively address such
confusions!)

Note:
  I know. All the "short discussions" are going to double the length
  of the paper and take a lot of time. But the most effective 
  educators and authors I've seen (Jeff Conklin, Martin Fowler) 
  do it automatically. (In my own writing, I try to give a short
  explanation of most every term, unless the readers couldn't 
  *possibly* be perusing the article without understanding it.)
  Anyway, I just wanted you to know that I appreciate how much
  effort it takes to address all these issues, and how gratified
  I am that you are seriously considering them.
 
> >> * A block is hashed. Ok. And a doc contains pointers to blocks.
> >>   Ok. But is a doc a block? How is it hashed? How do links
> >>   contribute to the hash?
> 
> Each version of a doc is a block... Links: Depends on how you make them
> (i.e., the format of the document): If they are inline, as in HTML, they
> contribute to the hash. If they are external-- anybody can contribute
> links by putting them in another block-- they do not.
> 
Oooh. That's an interesting feature. I'm looking forward to hearing
more about that. 

> In both XLink and Xanadu, links can be both inside a document (which
> gives them additional credibility... e.g. the user should be able to
> select 'view only links contained in the document') or they can be external.
> 
Hmmm. I'm familiar with XLink, but not with that feature. I guess I
need more "back story" on this topic.

> >> * Gzz is first mentioned here. It needs to be described earlier
> >>   in the Xanalogical addressing section.
> 
> Probably we should move the xu section after the block storage section,
> actually... reducing the back-and-forth.
> 
Sounds right.

> >> * "Storm was first developed for the Gzz application, a platform
> >>   explicitly developed to overcome the limitations of traditional
> >>   file-based applications" -- a *very* intriguing statement.
> >>   When Gzz is introduced, this statement needs to be expanded to
> >>   provide a short list of those limitations, and what Gzz did to
> >>   solve them. (It has to be very short, of course -- no mean feat.)
> 
> Challenging. :-) But you're right.
>
You may wind up writing a book...
:_)
 
> >>Application-Specific Reverse-Indexing
> >> * This lost me pretty quickly. I wasn't sure what the purpose
> >>   of this section was. I needed a use case or two to keep me
> >>   oriented. Later, it becomes clear that this is
> >>   a part of the versioning solution. Mention that fact here.
> >>   If possible, also give one or more examples of the other
> >>   indexing systems you created, to show what this section is
> >>   for.
> 
> Maybe this section should switch places with the versioning one.
>
If that can be done, it would work for me.
 
> >> * keyword searching
> >>   --it seemed to me that a keyword index would return every
> >>     *version* of a block that contained the word, which would
> >>     be a real weakness.
> >>   --(maybe versioning needs to be described first, so you can
> >>      discuss the indexing process in context, and mention the
> >>      resolutions for such issues?)
> 
> Ok, another reason. :-)
> 
> BTW, my take would be that the indexing would indeed return every
> version of a document ('version of a block' doesn't exist since blocks
> are immutable :) ). The UI would then sort out which versions are
> 'current' and show only those. This would also allow searching in past
> versions, when desired.
> 
Ah.

> A pointer block is obsolete if it is on the 'obsoleted' list of any of
> the other pointer blocks.
>
Oh. There's an obsoleted list. Maybe that was in the block-pointer
diagram, and I missed it.
 
> >storing diffs vs. intact versions
> You can do it either way 
>
Cool. Good to know.

> Not keeping the most recent version has reliability benefits: (at most
> you'll lose the current version, not the whole block)
> 
Excellent point! The example was really good, too. I've changed
my mind. The way you're doing it is right. (If there is room for
the example to explain why, awesome!)

> You mean storage media? Yes. Hey, I actually think we should make the
> point here that when you copy an image to another document, or keep
> differently edited versions of a movie, Storm stores the content only
> once-- and can thus *save* disk space :-)
> 
Wild! Really good point.

> >>Bottom Line
> >>-----------
> >>An excellent read, and a most promising technology.
> >>Thanks for sending it to me.
> 
> Thank you very much for your comments.
> 
Most welcome, and thank you for sharing this great read with me.
I look forward to "Taking the World by Storm".
:_)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]