Re: [Monotone-devel] Re: Support for binary files, scalability and Windo

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: Support for binary files, scalability and Windo

From:	Ori Berger
Subject:	Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Date:	Mon, 19 Jan 2004 01:05:44 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4; MultiZilla v1.5.0.2g) Gecko/20031007

graydon hoare wrote:

I don't know what you mean by "all sources for a file". could you be
more specific about where you mean to use a suffix array, and how it
helps with a particular bit of retrieval or storage? the way I see it, a
persistent suffix array makes the on-disk representation of a string
considerably larger, without improving the xdelta problem (which is:
find all common regions between these two strings)


What I meant by "all sources" is that there is no inherent a-priori
reason to encode a file with respect to one _given_ file. It is
often the case that the best reference is the previous version of
the same file. But when you merge two files into one, for example,
the "natural" delta is from both, rather than from a specific one.

If you have a persistent suffix tree (one tree for the _entire_
repository), when you want to put a new version in the database, you
just start walking the tree, and note what part of it you take from
what repository file. Renames, copies, splits, merges, etc. will all
be efficiently represented (in terms of "delta" storage) even
without any explicit "monotone rename" command.

Your elevator pitch includes "ancestry is just some interesting

metadata", and this is just taking it to the extreme ... A versionis stored as a patchwork of parts of all previous files, with orwithout any ancestry knowledge.

I don't think that the theoretic possibilities justify any work atthis time. Nevertheless, the idea is worth documenting, I think.

It's implementation is orthogonal to the block structure below, but
the underlying philosophy is the same.

Xdelta could be more-or-less as useful, if you store the adler32 ofall reference blocks in an accessible location that you can consultwhile scanning a new file.

    that technique is really nothing more than extending xdelta to work
    over block-structured files, which is (as I'll get to) advantageous
    since it permits Very Large Files (~2 ** 40 or so) and also makes
    network sync work.

What you're describing is, as you said, extended xdelta. What I wasdescribing is doing away with ancestry as a factor in storage andidentification. If breaking to blocks depends on context (e.g.,ancestry), then, patches that are applied in a different order (butresult in the same outcome) are likely to generate different blockstructures. Personally, I much prefer stateless representations thatdepend only on the data and not its history - YMMV.

hmm, no, not changing the manifest format. the file remains identifiedby its SHA1. if you don't like SHA1, substitute some other functionwhich makes an identifier given an input string, but if the filesystemcan maintain the notion of a "file" using inodes, so can manifests.

I didn't understand that one. I was suggesting (or at least, thoughtI was suggesting) either (a) to include SHA1 for blocks inside thesame manifest, or alternatively, (b) keep another per-file manifestthat says how to construct it from its SHA1 blocks, in the same waythat a manifest can be used to construct an atomic revision fromfiles. Whoever receives an (a) or (b) manifest can then go huntingfor either the blocks or the files, in whatever way needed. (b)would be completely transparent, and (a) would cause sha1sum tocorrectly check the complete files, but report the file _parts_ asmissing.

Yep, it's not too bright; But I wasn't suggesting dropping SHA1 inany way, and I couldn't understand the relation to inodes. Could youplease elaborate?

as far as I'm concerned, I'd keep doing what I'm doing now, reuse theblock + delta storage system for *storing* manifests, too. they're datatoo.

Hmm, I wasn't aware manifests were treated specially in storage. Ithought they were only special by their cert status.

hm, for the network I have a broader idea: use hash trees over theentire space of SHA1 to synchronize my collection + your collection intothe union of both (on both ends). with some special accounting to managesingletons and tombstones, and a good spread factor, it's very efficientto synchronize hash trees, and I'd use the exact same scheme to sync thecollection of blocks, the collection of manifests, the collection offiles, the collection of keys, and the collection of certs.

Sounds good. As Nathaniel hinted in a recent post, hash treesynchronization might not play well with your block-extended xdelta;It needs to be stateless to play well, I think.

About NNTP, email, "dumb web" distribution - all you have to do isrecord, for each block whether or not it was sent to a specificdestination.
no, I think I'd just remove these things altogether, let people syncbetween databases directly as the primary mode of operation.

Ouch. One of the things I like so much about monotone is that it istransport agnostic. Requiring a smart online server would kill that.I also think it's an increased barrier for many people.

It should still be possible to build an NNTP/email/dumb-web gatewaythough, even if you do move to an interactive hashtree-sync. It'sthe same logic I described earlier, of "for each block not yet sentto a gateway destination, send to gateway destination" and "addreceived blocks to database".


Ori.

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Support for binary files, scalability and Windows port, Asger Ottar Alstrup, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/12
  - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/15
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/15
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/16
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/16
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/17
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Nathaniel Smith, 2004/01/17
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/19
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger <=
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zack Weinberg, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Alstrup, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Peter Simons, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/19
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/20
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/20
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/21

Prev by Date: [Monotone-devel] Re: msvc
Next by Date: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Previous by thread: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Next by thread: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Index(es):
- Date
- Thread