[Monotone-devel] Re: Support for binary files, scalability and Windows p

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability and Windows p

From:	graydon hoare
Subject:	[Monotone-devel] Re: Support for binary files, scalability and Windows port
Date:	Mon, 19 Jan 2004 01:58:48 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Nathaniel Smith wrote:

On Sat, Jan 17, 2004 at 01:11:14AM -0500, graydon hoare wrote:
for a heavily edited file, it'll slowly get worse, but maybe you couldhave a "defragment" routine which builds some fresh blocks (especiallyif a bunch of blocks appear with refcount=1; might as well toss them)
How does this interact with hash tree synchronization?  I.e., what
happens if I do
  $ monotone sync
  $ monotone db defrag
  $ monotone sync
?  Will this cause problems, since efficient hash tree synchronization
seems to depend on both sides using the same blocks?

no, shouldn't cause any problem. I'm forseeing doing a sync on about 5different collections: blocks, files, manifests, certs, and keys. fileswould be addressed by SHA1 of the file, blocks by SHA1 of the block,etc. defragmenting a file would be a concern of the storage manager, butwouldn't change the file's SHA1. it might, in some cases, add a newblock full of coalesced constants. but that's a *good* thing, becausethen they can be reused by other files storage representations :)

as background (another email asked "how this stuff works"): the ideawith a hashtree (a.k.a. merkle tree) is that you and I both store ourentire set of "objects" (say, blocks or files or whatever) in an N-arytree where each leaf is positioned in the tree at some unique spot (inour case the tree represents the entire space of SHA1, each branchpeeling off some number of bits, and the leaf's position at the bottomof the tree is determined by content hash). each interior node's slotsare the hashes of the subnodes beneath it. the hash algorithm used forinterior nodes isn't necessarily related to the one pointing out toleaves, but we might as well use SHA1 again.

anyways, when you and I want to sync, I send you my root node, you cansee all the slots in it which are non-equal to *your* values for thoseslots: those are 1/N subtrees which have "a difference" in them,somewhere. so for each of those which is different, you send me *your*subtree node, and then I look at them and pick out the 1/(N*2) subtreeswhich contain the differences. we bounce back and forth at worstlog_N(X) times where X is the size of the space we're spanning (I'mthinking a 256-ary tree over SHA1: 20 levels, so at worst 10 roundtrips) and exchange at worst O(D*K) bytes overhead locating the Kdifferent objects. then we just transmit the objects missing from eachparty's collection, having isolated them. there are some cheap hacksused to make the average case costs collapse to the load of the treerather than its depth (so, more likely say 3 or 4 round trips), but evenin the worst case it's very efficient, and it's easy to pipeline.

note that this is a very general algorithm for synchronizing twocollections of arbitrary things. the "tree" here has nothing to do witha filesystem tree or the xdelta/suffix-tree storage representation for afile. those are orthogonal.

merkle trees over sets with not much in common are basically useless;they add overhead (the tree structure, round trips, and hashing) and youwill wind up exchanging a full list of objects anyways. might as welljust send a full list. where they are useful is when two parties have*nearly identical* large collections of objects, with just a fewdifferences since the last synchronization. then you can narrow it downto the set of differences in much less traffic than a complete listing.


-graydon

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Support for binary files, scalability and Windows port, Asger Ottar Alstrup, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/12
  - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/15
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/15
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/16
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/16
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/17
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Nathaniel Smith, 2004/01/17
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare <=
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/19
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/18
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zack Weinberg, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Alstrup, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Peter Simons, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/19
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/20

Prev by Date: [Monotone-devel] Re: dumb servers
Next by Date: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Previous by thread: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Next by thread: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Index(es):
- Date
- Thread