[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
From: |
Markus Schiltknecht |
Subject: |
Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon |
Date: |
Fri, 01 Jun 2007 18:36:28 +0200 |
User-agent: |
Icedove 1.5.0.10 (X11/20070329) |
Hi,
Christian Ohler wrote:
This looks like two separate issues to me:
(1) The total history size of a project in monotone grows without bound.
(2) The time it takes for a new developer to get a local workspace of a
project is too high with monotone.
As far as I can tell, problem (1) on its own isn't affecting anyone
right now -- even though there are a handful of projects in existence
that would run into it should they ever convert their history to
monotone. Problem (1) does imply problem (2) in theory, but the real
reason typical projects have problem (2) right now is unrelated to
problem (1). The reason is that mtn pull is too CPU-intensive and/or
not doing proper pipelining.
Agreed, as long as you are talking about relatively young repositories.
But there simply are repositories, which have quite a huge history-size
vs. checkout-size ratio. Way beyond that factor 3, which Nathaniel
claims to be the average. For example, my PostgreSQL repository (the
monotone database) is about 250 MB, while a tar.gz of a fresh checkout
is only 14 MB.
And things will get worse, as soon as people really start using tools
like monotone. For example, think about merge_into_dir. With that you
can easily drag in a complete foreign repository, and possibly hundreds
of megabytes - only to be able to propagate, but that's most probably
just exactly the feature you want - and why you chose to let monotone
track of that import from a foreign repository.
Basically, what I'm stating is, that the avg. history vs. checkout size
ratio probably is that low, because the tools to track history were
lacking. I bet that this ration will grow, as soon as people learn about
the benefits of properly tracking history.
In fact, what the Pidgin project is doing (download compressed mtn
database snapshots over HTTP) is a solution to (2) that doesn't solve
(1). Too bad mtn isn't smart enough to offer similar efficiency for
this particular case. It's a special case, but it's the case that matters.
Why do you want a solution, which solves only one problem? Partial pull
would solve (1) and (2), no?
A complete pull of Pidgin's current database transfers 120 MB. Is this
the size of history that we want to give up on and recommend partial
pull for? That doesn't seem very satisfactory.
Huh? Why not? Having to download 10MB or 120MB still makes a difference
of a few minutes on the avg. internet connection.
It's nowhere near the
several gigabytes of history that Nathaniel is calling an unreasonable
size. It should be within the range that mtn pull can deal with.
Partial pull would just be a workaround for mtn's inefficient pull
mechanism.
No, it would solve issue (1), too.
Maybe it's just a matter of optimizing the roster manipulation code. Or
maybe there's a way to avoid or defer some of the work that the code is
currently doing during pull. Maybe there's a way to short-circuit the
expensive roster manipulation and just copy node ids from the server
(with some simple adjustments) if the local database does not contain
any revisions connected to the subgraph being pulled?
I'm all for these optimizations. Please go ahead and optimize netsync,
that would be very nice.
[ Please note, that all of this has nothing to do with the debate about
single horizon vs. gaps implementation of partial pull. ]
Regards
Markus