monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon


From: Markus Schiltknecht
Subject: Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
Date: Fri, 01 Jun 2007 18:36:28 +0200
User-agent: Icedove 1.5.0.10 (X11/20070329)

Hi,

Christian Ohler wrote:
This looks like two separate issues to me:

(1) The total history size of a project in monotone grows without bound.
(2) The time it takes for a new developer to get a local workspace of a project is too high with monotone.

As far as I can tell, problem (1) on its own isn't affecting anyone right now -- even though there are a handful of projects in existence that would run into it should they ever convert their history to monotone. Problem (1) does imply problem (2) in theory, but the real reason typical projects have problem (2) right now is unrelated to problem (1). The reason is that mtn pull is too CPU-intensive and/or not doing proper pipelining.

Agreed, as long as you are talking about relatively young repositories.

But there simply are repositories, which have quite a huge history-size vs. checkout-size ratio. Way beyond that factor 3, which Nathaniel claims to be the average. For example, my PostgreSQL repository (the monotone database) is about 250 MB, while a tar.gz of a fresh checkout is only 14 MB.

And things will get worse, as soon as people really start using tools like monotone. For example, think about merge_into_dir. With that you can easily drag in a complete foreign repository, and possibly hundreds of megabytes - only to be able to propagate, but that's most probably just exactly the feature you want - and why you chose to let monotone track of that import from a foreign repository.

Basically, what I'm stating is, that the avg. history vs. checkout size ratio probably is that low, because the tools to track history were lacking. I bet that this ration will grow, as soon as people learn about the benefits of properly tracking history.

In fact, what the Pidgin project is doing (download compressed mtn database snapshots over HTTP) is a solution to (2) that doesn't solve (1). Too bad mtn isn't smart enough to offer similar efficiency for this particular case. It's a special case, but it's the case that matters.

Why do you want a solution, which solves only one problem? Partial pull would solve (1) and (2), no?

A complete pull of Pidgin's current database transfers 120 MB. Is this the size of history that we want to give up on and recommend partial pull for? That doesn't seem very satisfactory.

Huh? Why not? Having to download 10MB or 120MB still makes a difference of a few minutes on the avg. internet connection.

It's nowhere near the several gigabytes of history that Nathaniel is calling an unreasonable size. It should be within the range that mtn pull can deal with. Partial pull would just be a workaround for mtn's inefficient pull mechanism.

No, it would solve issue (1), too.

Maybe it's just a matter of optimizing the roster manipulation code. Or maybe there's a way to avoid or defer some of the work that the code is currently doing during pull. Maybe there's a way to short-circuit the expensive roster manipulation and just copy node ids from the server (with some simple adjustments) if the local database does not contain any revisions connected to the subgraph being pulled?

I'm all for these optimizations. Please go ahead and optimize netsync, that would be very nice.

[ Please note, that all of this has nothing to do with the debate about single horizon vs. gaps implementation of partial pull. ]

Regards

Markus





reply via email to

[Prev in Thread] Current Thread [Next in Thread]