Re: [Monotone-devel] Re: Support for binary files, scalability and Windo

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: Support for binary files, scalability and Windo

From:	Ori Berger
Subject:	Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Date:	Sat, 17 Jan 2004 03:41:49 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4; MultiZilla v1.5.0.2g) Gecko/20031007

graydon hoare wrote:

currently, files are stored, as you have probably guessed, as headversions + xdeltas. the xdeltas reach into the past. head versions areidentified by SHA1, deltas are identified by (pre,post) pairs of SHA1s-- the versions on either side of the delta. xdelta is itself basedsomewhat on hashing; common blocks are found by indexing all the blocksin the first file under their adler32 code, and rolling an adler32window of the same block size over the second file one byte at a time.then it writes a copy/insert delta stream. when you make a change theforwards xdelta is queued to send to the network, and the reverse xdeltais stored in the database, moving the head version to your newlycommited version.
this storage management system need not be the case. another feasibleform would involve removing the file/delta distinction and storing allfiles as hash trees, where each hash code identifies an existing datafragment (up to a fixed block size) *or* a subtree.

I had been thinking about this as well; In a message from september16th, I mentioned I'd describe my ideas when I get them moreorganized, but never had the time to get down to all the details.I'll try to join the brain dump instead.

First, it's possible to use a suffix tree to locate repeating parts.It's possible to implement much, much faster than rsync's algorithm,and it is finely grained - an implementation and description can befound in <http://www.cl.cam.ac.uk/~cpk25/libstree/>. Thisimplementation doesn't use persistent storage, but if it did, thenit would take more or less zero time to locate _all_ sources for afile almost immediately. (The naive storage requirements are O(n)for a text of length n, but I suspect that the constant factor mightbe prohibitive in practice).

Suffix trees may allow a "find likely ancestors" function to beimplemented, with which proper attribution and detection of filesplitting and merging can be made - but I digress.

I'd be just as happy to scrap the entire existing networking system andmove to a synchronization-based one, custom protocol or otherwise, if Icould work out a really efficient abstraction. synchronization isself-stabilizing, which I'm very attracted to. maybe hash trees are it.


Ok, here goes my braindump:

Find a way to partition files into blocks, in such a way that:
1. Block lengths are reasonable (definitions later)

2. The block boundaries are likely to be unchanged by smallmodifications - i.e. an insertion of 1 byte somewhere should havesmall probability of changing the boundary area, smaller probabilityof changing two boundaries, .... vanishingly small probability ofchanging all boundaries.

Now, given a file, break it into blocks; Store each blockindependently, with its own hash; Store a manifest that says how torebuild a file from its blocks. And that's it.

If all files were text files, of 200 lines or so, applying thiswould work well by just taking each line as a block, the boundariesdetected by seeing a newline (\n) character. Inserting a characterin a line would not affect any other line. The manifest would be alist of line identifiers. When a line is changed, we'd usually havea new manifest, and exactly _one_ line added to the database (allother lines=blocks will be reused).


To apply this to general files, we can use the following:
1. Block lengths are between 1/100 and 1/400 of the file length,
   but never shorter than (e.g) 80 bytes, and never longer than
   (e.g) 16 Megabytes.
2. The block boundaries are determined by computing some running
   checksum that depends only on (e.g) the last 80 characters, and
   asserting that a block has ended when it meets the length
   conditions set previously, and also, when the running checksum
   has some property that, probabilistically, appears ~200 times
   along the file. For example, let
   k = int (log_base_2 of (file length / 200))
   You could require a running alder32 (or cc32 or whatever) of
   the recent 80 bytes to have it's k least bits all zero.

Assuming the running checksum is reasonably "random", small changesshouldn't affect more than one or two blocks.

The blocks could be identified by their SA1, and the manifest couldeither be file-specific, or the standard manifest extended toinclude a part specification, e.g.


deadface dir1/entire
deadbeef dir1/entire*1*200
31415926 dir1/entire*2*200
.
.
.
b1e2e3f4 dir1/entire:200:200
f5a6c7e8 dir1/file2

(I put only 8 chars for the sha1, but ... you get the idea).

sha1sum would still be able to check the entire file; It willcomplain about the missing parts, naturally.

This also interoperates nicely with network stuff - To fetch aversion from a depot, you fetch the manifest, and then fetch allsha1 blocks listed in the manifest, that you don't already have. Assimple as that.

There's no need to have a staging queue for the depot. I suspect itwill cost a little in latency compared to a perfectly synchronizeddepot of today, as the client would have to fetch the manifest,parse it, decide what it needs to fetch, and only there starttransfer of the actual data - but the blocks themselves can bepipelined, so it's not going to be crucial, I think.

About NNTP, email, "dumb web" distribution - all you have to do isrecord, for each block whether or not it was sent to a specificdestination. If you want to push a version somewhere, you just pushall the blocks that have not yet been pushed to that location. Needto resync? Reset the "already sent" marks. You can push to a dumbftp or web server just as well - each block is placed in a file,whose name is the sha1 of the block, and so does the manifest. Theclient fetches the manifest, and then starts fetching (through httpor ftp) all required blocks. No need for a smart depot (although onecould make things more efficient, by serving directly from thedatabse or accepting directly to the database - the client doesn'teven have to be aware).


Problems I could think of, none too serious:

1. An adversary might construct files that don't break well, andchange them between versions in such a way that a complete copy isstored, even though there aren't many changes.

That's not really a concern, I think - the adversary could just havewell replaced the entire file and foiled any kind of delta compression.

2. File with large repeats, and / or large spans of zeros, wouldprobably reduce the block boundary decision algorithm to aconstant-length cutter or something like that.

Not really a problem, but perhaps could be handled better as aspecial case.


3. "phase change" effect when file size crosses a 2^k size boundary.

In this case, the boundary decision algorithm would use a differentdecision rule, and the two versions are much less likely to haveblocks in common - resulting in another copy of the file beingstored even though we could have reused many of the blocks.

It's not a _huge_ problem - this will only happen once for each 2^kcrossing, because if we cross back we'll have the old shorter blockversion to match against, and if we cross forward again, we'll havethe old longer block version to match against.

Further, this can be solved as a special case. The block boundarydecision algorithm is only important for storage efficiency, notintegrity - if it's replaced by no blocking at all, or breaking toexactly 8K blocks, everything will work as expected (albeit withmuch less efficient storage).



Pro:

* Older versions just as easy to access
* Easier to build "dumb" servers, simpler queuing rules, etc.
* Compression across all versions of all files (not just
  all versions of the one file).
* Easy to purge any specific version (using reference counts)
* Easy to overcome database/filesystem/memory limits, by proper
  choice of block sizes.

Con:

* Requires more storage and /or bandwidth; A one
  byte change in a 100MB file would cost ~500KB with this
  scheme, and ~ ~100B with the current delta scheme.
  Network bandwidth is similarly affected. This depends
  on how many blocks are changed, rather than the changes
  themselves: a sequential 400KB change that is entirely within
  one block will cost 400KB as delta, and 500KB as blocks.
  A change of one byte in every one of the 200 blocks will cost
  ~2KB as delta, and the entire 100MB as blocks.
* Latest version, unless properly cached, will take longer to
  construct (need to pull all blocks, which will be scattered
  in the database). And proper caching costs space....

The boundary determination scheme is not unlike the Rabin Karpsearch algorithm, or some of what rsync is doing, BTW.


End of braindump.

I was hoping to experiment and write it down more clearly, but Idon't think I'll get to do that soon enough; Hence the braindump.

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Support for binary files, scalability and Windows port, Asger Ottar Alstrup, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/12
  - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/15
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/15
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/16
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger <=
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/17
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Nathaniel Smith, 2004/01/17
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/19
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/18
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zack Weinberg, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Alstrup, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Peter Simons, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19

Prev by Date: Re: [Monotone-devel] Re: Re-creating a depot?
Next by Date: [Monotone-devel] Re: Re-creating a depot?
Previous by thread: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Next by thread: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Index(es):
- Date
- Thread