monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] RFC: CVS sync design


From: Christof Petig
Subject: Re: [Monotone-devel] RFC: CVS sync design
Date: Wed, 05 Jan 2005 15:40:30 +0100
User-agent: Mozilla/5.0 (X11; U; Linux ppc; de-AT; rv:1.7.3) Gecko/20041007 Debian/1.7.3-5

Nathaniel Smith schrieb:
Actually, let me throw another thought out here to: Is it really useful to commit whole chains of history to CVS? It would be significantly simpler from our point of view to just take a single revision, double-check that it's a descendent of the last sync'ed revision, and then just commit that snapshot to the CVS repo, completely ignoring intermediate versions.

If you think of CVS as being still the main repository for a project you
might want to preserve as much information in it as possible. [e.g.
intermittent tree states, especially commit logs etc.]

If somebody else committed while you created the (monotone based) fork
you have to merge the trees again before committing. Since CVS has no
notion of side branches your merged version will _immediately_ follow
the last commit on the main trunk (even in monotone). So unless you are
the only one doing changes to the whole tree while working offline in
your monotone database your changes actually have to land as a single
revision in CVS.

That's by design.

[Committing/Preserving side branches as CVS branches might be a separate
option (you'd need to specify the monotone head (by revision ID) and the
CVS branch name)]

The Monotone revision graph is going to be a superset of the stuff in
 CVS anyway, so I don't feel too bad about collapsing some edges.

Also, this is much simpler to implement, completely sidestepping like
the last 3 paragraphs of my last email. (Also some things I didn't mention... like what do we do with discontinuous branches? I.e., branches that have 'gaps', like A -> B -> C where A and C are in some
 branch but B isn't... should B be pushed to the CVS server?)

Actually being able to commit offline while showing my commit history to
co-workers which still use CVS only is one of my motivations for writing
this beast.

Also, it might actually be required for some cases. E.g., a typical use case for CVS-synching functionality might be that if I'm working on a project whose upstream uses CVS, then I want to use Monotone locally to work out my changes, and then when I'm done push them to the CVS repo. But Monotone and CVS have very different criteria for when a change should be committed; in Monotone it's perfectly common to commit a change just locally, as a checkpoint while working. In CVS, though, commits always affect everyone globally, so you have
many projects where the policy is that if you commit a broken
revision, you are a Bad Bad Person.  Even if you then immediately
commit a fixed revision.  So I will be violating upstream policy if I
simply push my full Monotone graph into CVS!

Causing Monotone to collapse edges even when not necessary (see above)
might be a later option, this is not my initial motivation. Perhaps some
trickery will do it anyway (commit one last change (e.g. changelog,
NEWS) to the base revision, then merge the two heads).

In a different mail N.S. wrote:
Cool, I think this would be really useful for a lot of people.

:-D

Do not store state information in both trees so that syncing with
several CVS servers is possible.


I don't understand the connection between these statements.  We could
certainly store in Monotone one cert that said "this revision corresponds to a checkin in CVS repo Foo, whose state was ...", and another that said "this revision etc. in CVS repo Bar, whose state was ...". Storing state seems like it could significantly reduce complexity, be very useful for later spelunking (cf. how subversion's
 CVS->SVN code stores things like CVS version numbers as metadata,
just in case they're useful later), and I don't see any immediate drawbacks...

I will code a remote CVS import first and then try to do a update (cvs_pull). If it shows up that this is too difficult or inefficient unless you store marker certs in your monotone db then I will give in. [Actually you have to sync initially without this information anyway, so the code has to be there]

And to be honest I have some projects which reside on multiple CVS servers (one is the master of course). Easing this pain seems to be trivial with a CVS<->monotone<->CVS setup.

Preserve changelog and timestamp of every change.


And author?  Or does CVS not let you set that?  (Monotone will let
you set the author field on commits to arbitrary strings, so _that's_
no problem.)

I can preserve the author in monotone. I can't tell CVS to not take the local user name for any checkins.

syntax: monotone pull [--branch foo]
cvs://localhost/usr/local/cvsroot module[:branch]


I would strongly prefer that this functionality not overload the meanings of push/pull/sync. Synchronizing with a CVS repository is a significantly different process than synchronizing with another Monotone repository. Maybe cvs_pull/cvs_push or something?

should we mimic CVS by introducing a -d swith?
Unless someone proposes a really convincing alternative I will stay with my syntax (I still think that it's semantic is similar enough to the original sync and I already introduced an URL (ssh://...) on the ssh branch).

Explicit is better than implicit. I think we should just make the user specify the desired correspondence between Monotone and CVS branches; the two systems are different enough that there's really no
 good way to guess.  (I'd even be fine with requiring them to type
HEAD when they wanted the head branch, rather than defaulting.)

Agreed.

This seems like the time when keeping some state would be really handy. What about having a cert that says "this revision corresponds
 to the the following files in CVS repository ___: file1  1.3 file2
1.8 ..." (I guess a problem here is what namespace to use for CVS
repositories. I guess I don't have any useful intuition here, since I
can't even think of a situation where one would want to synchronize
with two different CVS repos...)

Then the pull operation becomes: 1) traverse up from the branches
heads until we find such a cert. (If we don't find such a cert, then
we start from the beginning.) 2) having found such a cert, we simply
request deltas forward from each revision mentioned in the cert until
the revisions in the current tip of the branch.

The initial pull would still be different (unless we require the monotone branch to be empty) and more similar to my proposal. (see above).

The push command will be an alias to sync because to check into a
CVS repository you need to have an up to date copy of it. [As we
surely all know ;-)]


I'd rather not have a 'sync', and instead have 'push' fail if commits
 have occurred since the last 'pull'.  This - matches the normal CVS
semantics for update/commit - is much less surprising than having
'push' actually do a 'pull' And 'sync' isn't useful anyway, because
when you do a 'pull' and then immediately do a 'push', at least one
of them will always be a no-op. (If the 'pull' is a no-op, the 'push'
will succeed; if the 'pull' actually pulls a new revision, then there
will be nothing for 'push' to do, because that revision will have no
children to be pushed.)

Agreed, but a sync might issue the necessary pull of recent changes since the data is needed anyway to determine the last revision in CVS for the push. (See my motivation about being able to push the whole history). So writing a push without pulling first is more complex (unless you save the state of the CVS tree, of course).

To I think clarify this, and suggest something _slightly_ different, here's my version of push: - find the latest revision that
corresponds to a cvs-manifest - check to see whether that
cvs-manifest is the tip of the branch we wish to sync with; if not,
error out, telling the user to perform a pull and do some merging -
now pick a child of that revision, commit it to the CVS server, and
recurse

The only tricky part is choosing the children to commit; this is the old 'pick a distinguished linear subbranch' problem. Some
strategies: a) pick randomly b) let the user choose the revision to
end up with, and pick a random path to get there c) recurse only so
long as there is a linear path to follow, and then stop when we reach
the first fork d) check ahead to see whether there are any forks, and
if there are, abort early and tell the user to specify explicitly
which revision they want to push to the server (this is similar to monotone's 'update' command). There must be a unique (linear) path
from the CVS tip to that revision.

I tend to choose c) (see above) e.g. the user has to specify which path to take when ambigious and not the HEAD to be.

It seems like some desireable properties are: 1) the user doesn't
have to do n push's to send n revisions to the server.  (So you want
to push whole chunks of the graph at once, at least sometimes.) 2)
you want to be able to specify which revision ends up as the CVS tip 3) you want to be able to specify exactly which revisions are committed (i.e. both which revision ends up as the CVS tip, and which
path is taken to get there)

I think in practice (b) is best. The only advantage of (c)/(d) over (b) is that they force you to specify the exact intermediate
revisions to commit, i.e. they prioritize (3) over (1). In most
cases, though, most people won't care exactly which revisions are
committed, so long as you end up with a branch tip that has all the
changes in it.  I.e., (3) is more important than (1).  So (b) is
better than (c)/(d).

(b) "Picking a random path from A to B" gives you less control and is more difficult to realize than (c). So I will start with (c)

This still leaves the question of, if there's more than one head and the user doesn't specify which one to end up with, do we abort and force the user to pick one, or do we pick one randomly?

I tend to push while possible and then tell the user to specify which path to take on the commandline (using a 'heads' like display). Iterate that and you push as much history into CVS as possible without creating side branches. (this also avoids the problem of specifying multiple side branches to walk)

   Christof

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]