monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] git fast-export


From: Felipe Contreras
Subject: Re: [Monotone-devel] git fast-export
Date: Mon, 5 Jan 2009 23:22:00 +0200

On Mon, Jan 5, 2009 at 8:09 AM, Derek Scherger <address@hidden> wrote:
> I've spent a bit of holiday hacking time working on a git_export command for
> monotone, more as an experiment than anything else. I've committed the
> result to net.venge.monotone.fast-export for people to have a look at.
> There's probably not much preventing this from landing on mainline, other
> than some documentation and possibly tests. Although I'm not really sure how
> we would want to go about testing it beyond what I've already done. The fun
> part about a command like this is that I expect most users of it would have
> some expectation of being their own testers in terms of verifying their
> conversions and such.

Great! I'm already trying it with Pidgin.

> This successfully (I think) converts the entire monotone database with 276
> branches (more or less what you get when you pull '*' from monotone.ca) to a
> git repository.Here's some details on the conversion:
>
> exported monotone database
> - 174MB in size
> - 276 branches
> - 127 tags (with one duplicate name monotone-viz-1.0.1-1
> - export time 83m42.134s (on a 2.0GHz pentium-m laptop)
> - export file size 2.9GB
> - 15245 revisions exported
>
> imported git repository
> - 719MB in size (before being repacked)
> - import time 23m15.463s
> - repack -adf time 3m14.385s
> - packed repository size 60MB
> - 277 branches (the extra one is "master")

Why an extra "master" branch? There's no need for that branch.

> - 126 tags (missing the duplicate above)
>
> Three exported branch names "net.prjek:tester",
> "net.prjet:tester/drop-for-propagate" and "prjek.net:tester" where changed
> (with sed) during the import process because git does not allow colon's (and
> various other characters) in branch/ref names. I simply changed ":" and "/"
> in these names to "." although the "/" should have worked it did cause an
> error of some sort.
>
> The conversion was verified by checking out each of the 276 branches and 126
> tags from both git and mtn and comparing the resulting workspaces. The
> script I used to do this verification was a bit dumb and failed to checkout
> a few revisions so these weren't compared. Using only the branch name failed
> in some cases because there were multiple heads and using only a tag name
> failed in some cases because the tagged revisions had no branch certs. All
> of the branches and tags that did checkout were identical according to diff
> -qr so I'm reasonably confident that the new exporter basically works.

I have a ruby script (mtn2git) that I'm pretty confident generates an
exact clone, the problem is that it's *very* slow.

I could probably compare the output of mtn2git with your tool but it
would probably take more than one entire day to generate the repo.

> I suspect that the various other git fast-import conversion scripts that
> exist for monotone are probably slower and less robust than this
> implementation (unless they work similarly from rosters) which uses the
> monotone internals to do the work. I spent a bit of time initially trying to
> export revisions using the revision data structures but this didn't work
> very well. Git only deals with files and trying to order a mix of renames of
> directories and files from monotone correctly from revisions was difficult.
> Ultimately I didn't use the revision data structures at all but built up a
> similar files-only based revision representation by comparing rosters. Much
> like what is done for make_cset, but ignoring directories and producing only
> file deletions, renames and additions. This works much better, correctly
> handles pivot_root and a few other odd things that working with revisions
> proved difficult.

Working with the roster is extremely slow. Right now your tool is
taking about 6 seconds per commit, that's too slow.

I agree that working with revisions it very error prone, but it's the
only decent approach if you want something fast.

I think the best way to do this would be with revisions, and careful
comparisons with other more robust approaches, until all the issues
are tracked down.

Cheers.

-- 
Felipe Contreras




reply via email to

[Prev in Thread] Current Thread [Next in Thread]