monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability and Windows p


From: graydon hoare
Subject: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Date: Mon, 19 Jan 2004 01:58:48 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Nathaniel Smith wrote:
On Sat, Jan 17, 2004 at 01:11:14AM -0500, graydon hoare wrote:

for a heavily edited file, it'll slowly get worse, but maybe you could have a "defragment" routine which builds some fresh blocks (especially if a bunch of blocks appear with refcount=1; might as well toss them)


How does this interact with hash tree synchronization?  I.e., what
happens if I do
  $ monotone sync
  $ monotone db defrag
  $ monotone sync
?  Will this cause problems, since efficient hash tree synchronization
seems to depend on both sides using the same blocks?

no, shouldn't cause any problem. I'm forseeing doing a sync on about 5 different collections: blocks, files, manifests, certs, and keys. files would be addressed by SHA1 of the file, blocks by SHA1 of the block, etc. defragmenting a file would be a concern of the storage manager, but wouldn't change the file's SHA1. it might, in some cases, add a new block full of coalesced constants. but that's a *good* thing, because then they can be reused by other files storage representations :)

as background (another email asked "how this stuff works"): the idea with a hashtree (a.k.a. merkle tree) is that you and I both store our entire set of "objects" (say, blocks or files or whatever) in an N-ary tree where each leaf is positioned in the tree at some unique spot (in our case the tree represents the entire space of SHA1, each branch peeling off some number of bits, and the leaf's position at the bottom of the tree is determined by content hash). each interior node's slots are the hashes of the subnodes beneath it. the hash algorithm used for interior nodes isn't necessarily related to the one pointing out to leaves, but we might as well use SHA1 again.

anyways, when you and I want to sync, I send you my root node, you can see all the slots in it which are non-equal to *your* values for those slots: those are 1/N subtrees which have "a difference" in them, somewhere. so for each of those which is different, you send me *your* subtree node, and then I look at them and pick out the 1/(N*2) subtrees which contain the differences. we bounce back and forth at worst log_N(X) times where X is the size of the space we're spanning (I'm thinking a 256-ary tree over SHA1: 20 levels, so at worst 10 round trips) and exchange at worst O(D*K) bytes overhead locating the K different objects. then we just transmit the objects missing from each party's collection, having isolated them. there are some cheap hacks used to make the average case costs collapse to the load of the tree rather than its depth (so, more likely say 3 or 4 round trips), but even in the worst case it's very efficient, and it's easy to pipeline.

note that this is a very general algorithm for synchronizing two collections of arbitrary things. the "tree" here has nothing to do with a filesystem tree or the xdelta/suffix-tree storage representation for a file. those are orthogonal.

merkle trees over sets with not much in common are basically useless; they add overhead (the tree structure, round trips, and hashing) and you will wind up exchanging a full list of objects anyways. might as well just send a full list. where they are useful is when two parties have *nearly identical* large collections of objects, with just a few differences since the last synchronization. then you can narrow it down to the set of differences in much less traffic than a complete listing.

-graydon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]