[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Re: database schema questions
From: |
Derek Scherger |
Subject: |
[Monotone-devel] Re: database schema questions |
Date: |
Wed, 21 Apr 2004 19:33:55 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040225 |
graydon hoare wrote:
this is a very important and rarely-stated aspect of monotone's design:
file and manifest storage is *unrelated* to history. history is a
description of what happened, in terms of certificates which point to
SHA1 values. the method of constructing the data for a given SHA1 value
is *completely independent* of that history.
ahh... I had missed that
yes, but that would consume more bytes than the current form. in some
cases quite a few more bytes. it is important to note that we store
manifests as *gzipped* text. this means we get free statistical data
compression. look at a large manifest (say gcc: about 900k). most of the
bytes on most of the lines of that manifest are identical path prefixes:
I was expecting that the file names might be encoded and possibly compressed as well, but
"compressing" single names would probably make them longer.
I'm curious, how does monotone perform on something this large? I've been wondering when I
do a monotone diff on a large tree how bad computing lots of sha1 id's would be. I assume
that diff/status must recompute the hash of every file and compare them with the value in
the manifest.
now, we could possibly achieve the same (or even better) compression
result "explicitly" at the storage level by storing directories as first
class citizens, and putting files as leaves in directories, and chaining
through the directory table to get to a file.. but that adds a lot of
complexity. I didn't want to bother.
No, the way you have it now has some definite appeal over handling directories explicitly.
It's simple and it works!
- merging is more difficult than SQL union in all but the most trivial
I did expect that this was over simplifying things!
cases, so you wind up having to construct a memory representation
anyways. diff_patch.cc::merge3 is one of the largest and most
complex functions in monotone (300 lines), and that's the simplified
You're doing pretty well if that's your worst example of complexity I think!
- my representation of "a file" at the database level is really not
compatible with the notion people have of "a history line". storage
delta chains can be broken, or mis-ordered, or whatever. they don't
This might take a bit of time to sink in.... ;)
if you want to conduct an experiment and try rewriting part of the
schema in terms of explicit directories and manifests-as-tuples, I won't
stop you. it's possible it would be a space win and maybe simplify some
algorithms. I found the current arrangement beneficial though.
I was just wondering why you hadn't gone with what looked like the obvious way to store
manifests. It doesn't surprise me that it didn't work out that way.
Thanks again for all the info!
--
Cheers,
Derek