monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] RFC: CVS sync design


From: Nathaniel Smith
Subject: Re: [Monotone-devel] RFC: CVS sync design
Date: Tue, 11 Jan 2005 00:08:28 -0800
User-agent: Mutt/1.5.6+20040907i

On Wed, Jan 05, 2005 at 03:40:30PM +0100, Christof Petig wrote:
> Nathaniel Smith schrieb:
> >The Monotone revision graph is going to be a superset of the stuff in
> > CVS anyway, so I don't feel too bad about collapsing some edges.
> >
> >Also, this is much simpler to implement, completely sidestepping like
> >the last 3 paragraphs of my last email.  (Also some things I didn't 
> >mention... like what do we do with discontinuous branches?  I.e., 
> >branches that have 'gaps', like A -> B -> C where A and C are in some
> > branch but B isn't... should B be pushed to the CVS server?)
> 
> Actually being able to commit offline while showing my commit history to
> co-workers which still use CVS only is one of my motivations for writing
> this beast.

Yeah, having thought about this a bit more, I agree, we should keep
history.  The really compelling use case for this feature seems to be
"partial migration" -- it lets individuals switch to Monotone for
their own use and get benefits, without having to convince their whole
project to switch or incurring any risk.  This actually seems like it
could be a killer feature for a VCS tool.

> In a different mail N.S. wrote:
> >I don't understand the connection between these statements.  We could
> > certainly store in Monotone one cert that said "this revision 
> >corresponds to a checkin in CVS repo Foo, whose state was ...", and 
> >another that said "this revision etc. in CVS repo Bar, whose state
> >was ...".  Storing state seems like it could significantly reduce 
> >complexity, be very useful for later spelunking (cf. how subversion's
> > CVS->SVN code stores things like CVS version numbers as metadata,
> >just in case they're useful later), and I don't see any immediate 
> >drawbacks...
> 
> I will code a remote CVS import first and then try to do a update 
> (cvs_pull). If it shows up that this is too difficult or inefficient 
> unless you store marker certs in your monotone db then I will give in. 

Uh... okay.  I don't really understand why you prefer a less-robust,
harder-to-code, more-code-to-be-buggy solution even if it could be
made to more-or-less work ;-), but I won't stop you :-).

I _will_ happily throw nasty edge cases at you, though ;-).
  - I make a change in Monotone, then decide it was a bad idea, and
    undo that change.  I then push to the CVS repo; however, because
    my tree now is the same as it was before I made and then unmade my
    change, nothing is pushed, and that history is lost.
  - Exact same thing works with "Monotone" and "CVS" switched

I'll also ask if you can implement your version in less than quadratic
time; don't we have to climb all over both the CVS history and the
Monotone revision graph, searching for a match?  And isn't looking at
each revision in the CVS history really expensive, like, involving
network round-trips and the like?

> [Actually you have to sync initially without this information anyway, so 
> the code has to be there]

I don't understand this statement at all.  (See below.)

> And to be honest I have some projects which reside on multiple CVS 
> servers (one is the master of course). Easing this pain seems to be 
> trivial with a CVS<->monotone<->CVS setup.

Fair enough.  It seems trivial enough to do this in my design by just
naming each CVS repo, anyway.

> >>>syntax: monotone pull [--branch foo]
> >>>cvs://localhost/usr/local/cvsroot module[:branch]
> >
> >I would strongly prefer that this functionality not overload the 
> >meanings of push/pull/sync.  Synchronizing with a CVS repository is a
> >significantly different process than synchronizing with another 
> >Monotone repository.  Maybe cvs_pull/cvs_push or something?
> 
> should we mimic CVS by introducing a -d swith?
> Unless someone proposes a really convincing alternative I will stay with 
> my syntax (I still think that it's semantic is similar enough to the 
> original sync and I already introduced an URL (ssh://...) on the ssh 
> branch).

Well, let's not make a bike-shed issue out of this.  How about, for
now we just focus on getting the functionality right, and stick the
pieces under their own commands since that's simplest.  And then once
that's working we can think about the best UI, whether we want to hook
into some sort of URL-parsing code, etc.  Make sense?

> >This seems like the time when keeping some state would be really 
> >handy.  What about having a cert that says "this revision corresponds
> > to the the following files in CVS repository ___: file1  1.3 file2
> >1.8 ..." (I guess a problem here is what namespace to use for CVS
> >repositories. I guess I don't have any useful intuition here, since I
> >can't even think of a situation where one would want to synchronize
> >with two different CVS repos...)
> >
> >Then the pull operation becomes: 1) traverse up from the branches
> >heads until we find such a cert. (If we don't find such a cert, then
> >we start from the beginning.) 2) having found such a cert, we simply
> >request deltas forward from each revision mentioned in the cert until
> >the revisions in the current tip of the branch.
> 
> The initial pull would still be different (unless we require the 
> monotone branch to be empty) and more similar to my proposal. (see above).

I still don't understand this statement at all.

I think the right thing to do on an initial pull is just to always act
as if if the monotone branch is empty.  There's no problem in Monotone
with having a branch without a unique root.  I guess you're imagining
some sort of setup where, if the user tries to suck a CVS branch into
some Monotone branch that already has revisions in it, we should look
at those revisions and somehow guess one of them to hang the CVS
branch's initial checkin off of?  This just doesn't make much sense to
me.  The initial version in the CVS branch has no parent version;
we're trying to mirror the CVS branch into Monotone; so we should
create a revision with no parent revision.  Is there another way to do
this that makes sense at all?

> >I'd rather not have a 'sync', and instead have 'push' fail if commits
> > have occurred since the last 'pull'.  This - matches the normal CVS
> >semantics for update/commit - is much less surprising than having
> >'push' actually do a 'pull' And 'sync' isn't useful anyway, because
> >when you do a 'pull' and then immediately do a 'push', at least one
> >of them will always be a no-op. (If the 'pull' is a no-op, the 'push'
> >will succeed; if the 'pull' actually pulls a new revision, then there
> >will be nothing for 'push' to do, because that revision will have no
> >children to be pushed.)
> 
> Agreed, but a sync might issue the necessary pull of recent changes 
> since the data is needed anyway to determine the last revision in CVS 
> for the push. (See my motivation about being able to push the whole 
> history). So writing a push without pulling first is more complex 
> (unless you save the state of the CVS tree, of course).

Not entirely sure I understand this.  But I _think_ there are two
points to make here:
  - the only reason push would need to do a _real_ pull is if we don't
    store state, and we have to pull down the latest tree version in
    order to walk the revision graph and guess which revision it might
    correspond to (or walk the entire revision graph, kind of a bad
    idea if we're talking about something like gcc).  If we do store
    state, then push just has to do an up-to-date check, exactly like
    'cvs commit' does.
  - Obviously "push" needs to do sanity-checking of some sort; my
    comment is really about the UI.  I'm saying "push" should never
    modify the Monotone database (except maybe to add a cert recording
    that a push happened), and "sync" shouldn't exist.  If "push" has
    to pull things over the network, that's fine, it just shouldn't
    write them to the database.

> >To I think clarify this, and suggest something _slightly_ different, 
> >here's my version of push: - find the latest revision that
> >corresponds to a cvs-manifest - check to see whether that
> >cvs-manifest is the tip of the branch we wish to sync with; if not,
> >error out, telling the user to perform a pull and do some merging -
> >now pick a child of that revision, commit it to the CVS server, and
> >recurse
> >
> >The only tricky part is choosing the children to commit; this is the 
> >old 'pick a distinguished linear subbranch' problem.  Some
> >strategies: a) pick randomly b) let the user choose the revision to
> >end up with, and pick a random path to get there c) recurse only so
> >long as there is a linear path to follow, and then stop when we reach
> >the first fork d) check ahead to see whether there are any forks, and
> >if there are, abort early and tell the user to specify explicitly
> >which revision they want to push to the server (this is similar to 
> >monotone's 'update' command).  There must be a unique (linear) path
> >from the CVS tip to that revision.
> 
> I tend to choose c) (see above) e.g. the user has to specify which path 
> to take when ambigious and not the HEAD to be.

I don't like (c) much, actually; it seems inconsistent with Monotone's
esthetics.  In general I prefer my operations to either run to
completion, or error out with no changes having been made, asking for
whatever information is needed to run to completion.  So if I tell
Monotone to push everything up to the head, it shouldn't decide to
just push part of the history.

> >It seems like some desireable properties are: 1) the user doesn't
> >have to do n push's to send n revisions to the server.  (So you want
> >to push whole chunks of the graph at once, at least sometimes.) 2)
> >you want to be able to specify which revision ends up as the CVS tip 
> >3) you want to be able to specify exactly which revisions are 
> >committed (i.e. both which revision ends up as the CVS tip, and which
> >path is taken to get there)
> >
> >I think in practice (b) is best.  The only advantage of (c)/(d) over 
> >(b) is that they force you to specify the exact intermediate
> >revisions to commit, i.e. they prioritize (3) over (1). In most
> >cases, though, most people won't care exactly which revisions are
> >committed, so long as you end up with a branch tip that has all the
> >changes in it.  I.e., (3) is more important than (1).  So (b) is
> >better than (c)/(d).
> 
> (b) "Picking a random path from A to B" gives you less control and is 
> more difficult to realize than (c). So I will start with (c)

Huh?

(b) doesn't actually give you any less control; it just lets you
choose how much control to exert.  If I tell it to go from A to B and
there are multiple paths, then presumably I don't care.  If I really
care that it takes the path through C, then I just run two commands,
the first pushing A -> C, the second pushing C -> B.  This is exactly
as much work for the user as (c).  (Note in particular, that (b) and
(c) _only_ differ when there are multiple branches that could be
taken; if the branch is linear, then all the methods produce the same
results.)

Also, I don't see how (b) is any more work than (c) at all.  If
anything it's less, because there's less error checking to worry
about?

Hope this discussion is helping,
-- Nathaniel

-- 
"Of course, the entire effort is to put oneself
 Outside the ordinary range
 Of what are called statistics."
  -- Stephan Spender




reply via email to

[Prev in Thread] Current Thread [Next in Thread]