gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] Raw pools?


From: Benja Fallenstein
Subject: Re: [Gzz] Raw pools?
Date: Sun, 10 Nov 2002 20:00:26 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020913 Debian/1.1-1

Tuomas Lukka wrote:

On Sun, Nov 10, 2002 at 02:39:58PM +0100, Benja Fallenstein wrote:
Tuomas Lukka wrote:

An idea, related to the "canonical blocks":
Maybe we should label some pools "raw", i.e. no header, just  a block of
binary data. That way, we could be compatible with other content-based systems
for externally obtained data.


That means we cannot move blocks from there to non-raw pools. Also, we'd have to guess the type.

Yes. But I think that's acceptable for the intended purpose:
allowing us to use material that we can't distribute.


As one of the important principles of Storm, I see what I call the "persistency commitment" (which I should peg ;-) ):

*The specification guarantees that all blocks created according to it remain readable indefinitely-- in fifty years, or two hundred-- by any conforming implementation. This extends to all future versions of this specification, meaning that ALL future versions of the specification will be able to read ALL blocks created according to the rules in any previous version.*

Since all future implementations will have to support all 'features' we put in now, we have to be extremely careful, because all garbage we put in now has to be carried along indefinitely. This is relevant in two ways: - If we support "raw pools" now, all future versions of Storm will need to support raw pools; otherwise, the blocks could not be referenced any more. This would violate the commitment. - Storm would cease to be a mapping from ids to a block of binary data *with metadata, at least a Content-Type*. This change would not be revertible, since that would violate the commitment.

I want Storm to be really simple at heart; provided that you can still run or can reproduce the application layer functionality above the Storm layer, it should be possible to keep a collection of blocks in *any* Storm pool implementation and revive it in fifty years or whenever. I expect pool implementations to be quite different in fifty years-- but I require them to in some way represent a binary block of bytes with a MIME header containing at least a content type. As long as they do that, I can copy the blocks with notes I have now without ever losing data. The implementation can be manifold-- for example, I could store them in OceanStore and highly replicated to avoid lossage, or I could store them in compressed format on a CD I have at home-- and because all those implementations are required to support the same block format, I'm guaranteed that I can move blocks from one to another freely, thus I'm able to archive my blocks even if the underlying implementations change drastically.

Thus, I don't agree that any extension to the Storm specification *can* be decided in the context of a single intended use (such as distributing references to copyrighted blocks). To create a special type of pool which does not allow blocks to be moved to other types of pools would either break the property I mentioned-- stuff in Storm should be freely movable between pool implementations-- or it would require every implementation (in directory, on an HTTP server, in OceanStore, in Circle, ...) to have a 'raw' and a 'non-raw' flavor. So saying 'it's ok for this purpose' doesn't work.

[Disclaimer: Of course, it's not unlikely at all that Storm won't survive even ten years. The point is to try and publish the results, so that somebody designing a protocol that does last that long can learn from it.]

E.g. for xupdf, this would be vital for other people to be able
to use the demo.


We have the canonical blocks (with just the Content-Type header); since you have to call a program to put something inside Storm anyway (unless you're going to calculate the SHA-1 hash yourself), I don't see the difference it would make at this point in time.

How vehemently opposed are you?


As the above shows, very. ;-) I don't think the single purpose here is worth the change.

*snip my proposal*

(Note that the body chunks will *not* be Storm blocks, even though they're refered to by an SHA-1 hash. It is therefore illegal to refer to a body chunk through a Storm URI.)

So these would, essentially, be blocks in raw pools?


No, since they would not be blocks. :-) But they would have the property you're searching for: they could be queried through a content distribution network that uses a plain SHA-1 hash for the identification of files.

The difference to your proposal above is that this proposal a) does not break the property that blocks can be moved from any pool to any pool freely (BUT it does require pool implementations to support both the old-style and the new-style id checking mechanisms); b) does not break the assumption that all Storm blocks have a content-type (so you neither need to guess nor give any hints in the context of a Storm URI, which I fear would happen otherwise); and c) has applicability far beyond the special case you were talking about.

Let me put it like this: It has broad enough applicability that I'm thinking this *might* just be good enough to warrant the burden on future implementations.

On the downside, there would be more computation in id checking, since we'd also need to parse the header.

Very small price to pay.


Ok.

- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]