gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] Raw pools?


From: Tuomas Lukka
Subject: Re: [Gzz] Raw pools?
Date: Mon, 11 Nov 2002 11:52:57 +0200
User-agent: Mutt/1.4i

On Sun, Nov 10, 2002 at 08:00:26PM +0100, Benja Fallenstein wrote:
> Tuomas Lukka wrote:
> 
> >On Sun, Nov 10, 2002 at 02:39:58PM +0100, Benja Fallenstein wrote:
> > 
> >
> >>Tuomas Lukka wrote:
> >>
> >>   
> >>
> >>>An idea, related to the "canonical blocks":
> >>>Maybe we should label some pools "raw", i.e. no header, just  a block of
> >>>binary data. That way, we could be compatible with other content-based 
> >>>systems
> >>>for externally obtained data.
> >>>
> >>That means we cannot move blocks from there to non-raw pools. Also, we'd 
> >>have to guess the type.
> >
> >Yes. But I think that's acceptable for the intended purpose:
> >allowing us to use material that we can't distribute.
> >
> 
> As one of the important principles of Storm, I see what I call the 
> "persistency commitment" (which I should peg ;-) ):

We should probably PEG the basic ideology of Storm as a whole:

        - persistent blocks: operations
                - get a block's bits EXACTLY or "sorry, can't get them" -answer
                - store a block, get an ID
        - pointers 

        - ...

> Since all future implementations will have to support all 'features' we 
> put in now, we have to be extremely careful, because all garbage we put 
> in now has to be carried along indefinitely. This is relevant in two ways:
> - If we support "raw pools" now, all future versions of Storm will need 
> to support raw pools; otherwise, the blocks could not be referenced any 
> more. This would violate the commitment.

Ahh, I think I was not clear, once again.

I didn't mean that raw pools should be equal to normal storm pools.
The raw blocks would *NOT* have storm ids and would not be addressed in
that way. 

The point is that we need a level of indirection from the storm block
of a PDF file (header) (that should be redistributable) to the actual
file (that is not redistributable).

> - Storm would cease to be a mapping from ids to a block of binary data 
> *with metadata, at least a Content-Type*. This change would not be 
> revertible, since that would violate the commitment.

I'm not proposing that!
> 
> [Disclaimer: Of course, it's not unlikely at all that Storm won't 
> survive even ten years. The point is to try and publish the results, so 
> that somebody designing a protocol that does last that long can learn 
> from it.]

It's much easier to create a new format than to stop using an old one.
If Storm ever catches on, there'll be running implementations 20 years for now 
;)

> >E.g. for xupdf, this would be vital for other people to be able
> >to use the demo.
> >
> 
> We have the canonical blocks (with just the Content-Type header); since 
> you have to call a program to put something inside Storm anyway (unless 
> you're going to calculate the SHA-1 hash yourself), I don't see the 
> difference it would make at this point in time.

It *does* make a big difference: the SHA-1 is not the same that someone
just obtaining the file would calculate. That's a big issue because
most SHA-1 -content-based-retrieval systems will *NOT* have the
same Content-TYpe header.

> >How vehemently opposed are you?
> >
> 
> As the above shows, very. ;-) I don't think the single purpose here is 
> worth the change.

Actually, you're not, as your proposal shows. You're assuming that I want
the raw blocks to be "real" storm blocks, i.e. first-class citizens.

This is not so. For my purposes, having them in a completely different
namespace is fine.

> *snip my proposal*
> 
> >>(Note that the body chunks will *not* be Storm blocks, even though 
> >>they're refered to by an SHA-1 hash. It is therefore illegal to refer to 
> >>a body chunk through a Storm URI.)
> >>   
> >>
> >
> >So these would, essentially, be blocks in raw pools?
> >
> 
> No, since they would not be blocks. :-) But they would have the property 
> you're searching for: they could be queried through a content 
> distribution network that uses a plain SHA-1 hash for the identification 
> of files.

Exactly.

> Let me put it like this: It has broad enough applicability that I'm 
> thinking this *might* just be good enough to warrant the burden on 
> future implementations.

Great, let's PEG it.

        Tuomas





reply via email to

[Prev in Thread] Current Thread [Next in Thread]