gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Why we might use subversion instead of arch.


From: Pierce T . Wetter III
Subject: [Gnu-arch-users] Why we might use subversion instead of arch.
Date: Fri, 20 Feb 2004 10:45:39 -0700


Note: The purpose of this email is not to rile you guys up. Its for me to document my findings about arch, so that either you can correct my errors, or so you can improve arch, since I really liked the distributed nature of it.

 Background:

Like many people we use CVS for our version control, because as someone said once, "CVS sucks, but it sucks less then anything else". However, Subversion is reaching 1.0 status, so I decided it was worth checking out some alternatives. Pretty much, that came down to two choices, svn and tla.

 Our setup:

We have a lot of distributed employees, and also employees who telecommute. Worse case is me in Flagstaff, AZ on dialup talking to Raleigh, NC. Our current CVS repository is about 300MB with all the history.

 Our work process:

There are two main cases: I (Pierce) tend to make lots of small incremental changes, because I do the UI. Mike tends to make lots of large changes, since he works on the backend servers and needs to change both our object model, and the servers in one pass. I'll call myself "incremental_guy".

So for me, checkout, edit, update, checkin works great. So I actually am perfectly happy with CVS.

For Mike, he wants to branch, move everyone else's HEAD changes into his code, then check back in. What he does now is just have several checkouts running in parallel all the time, which is actually similar to arch. We'll call Mike "batch_guy".

 Why arch would be cool over subversion:

Since there's no concept of a "central" repository, at best a "blessed" repository, we could do stuff like the following:

Everytime we code freeze for deployment, we copy "blessed" to "deployment".

When developers have changes, they merge them into the "deployment" repository if they're bug fixes for the deployment, along with "blessed" so that there is a local copy. This is really the same thing as a deployment branch, but conceptually it seems easier, and it would avoid problems we have where fixes don't quite make it into the deployment build.

If two engineers need to work together, like if "batch_guy" needs to work with "incremental_guy", no one else has to be involved, they can just merge their changes together.

Since we have a system of servers, clients, etc. most developers end up having several machines they have to keep in sync. With arch, your local test server could check out from your personal repository.

 How we would have to setup:

Well, first, every developer would end up needing to have a network accessible "master" archive. Since arch doesn't have any concept of a server process, that means setting up a web dav server with multiple subfolders:

   /archrepositories/incremental_guy
   /archrepositories/batch_guy
   /archrepositories/blessed
   /archrepositories/development

Predominantly, mostly developers would use the /archrepositories/development repository as "truth". You'd only need your "personal" archive if you needed to work with someone else independently of the archive.

 Now for the bad stuff:

Ok, so I tried experimenting with arch. The first thing I did was check out something from a public arch repository. I got quite a shock. Evidentially, every arch repository stores the "base code", then follows that with a series of forward patches. This is quite different from most other version control systems, which store the head version as "truth" and then keep reverse patches going backwards. The net effect of this is that checking out that version required downloading not just the latest code, but downloading all the patches in between.

That was quite a shock. For projects with lots of small changes, it probably is inconsequential, but for me, on a dialup, it would really suck. Now I read some stuff on the wiki about how you can make all that faster by making a new archive (which moves the base), but I shouldn't have to change my work process to make the version control system efficient.

The next thing I noticed was that while CVS and Subversion let you structure your projects and sub projects via the filesystem, arch really tries to grab the whole filesystem as one unit. You can override this a bit, but it involves setting up some config files. Config files that are kind of poorly documented (based on the fact that I couldn't make heads or tail of the explanation). This makes a lot of sense for open source projects focused on a single executable, but makes much less sense for us. I suspect most people deal with this but just having lots of arch repositories:

   /archrepositories/blessed/tool
   /archrepositories/blessed/library
   /archrepositories/blessed/application

 But that would be a nightmare for us.

The next thing I found was that it was SLOW. tla is kind of brute force, and all that diff-ing, tar-ing, and compressing can take quite a while.

So at this point, while the distributed repository stuff was cool, I had to conclude that arch works best for working on open-source development where you don't submit code so much as you submit patch files, and you need to merge patches from multiple places. From that point of view, arch is great. From ours, ugh.

 How I would improve arch:

Fundamentally, I think that arch should store HEAD, with reverse patches, rather then START with forward patches.

   The rsync protocol would make more sense then webdav or ftp.

Improve the documentation, especially needed is a section with some arch concepts, so that you don't have to pick up everything by osmosis.

While tla is ok as a low-level tool, I've observed that everyone keeps trying to replace it with a driving script. That's a good instinct. For one thing, I think that:

  user--archive--task

  is harder to read then:

 tla make-archive --id address@hidden  --name archive

 tla archive-setup --project hello-world --branch mainline --version 0.1

It would be a trivial change to tla to support passing archive names as individual parameters, but I think it would flatten the learning curve of arch. Especially since I think that if you break up the names, you can realize that it would be pretty easy to come up with standard defaults for most of these, such that you only have to type:

 tla archive-setup --project hello-world

 because branch defaults to "mainline", and version defaults to 1.0.

Or perhaps the project name could even be taken from the current working directory, so all you would need is:

 tla archive-setup

 Similarly:

  tla get --project hello-world  hello-world-Alice

Would try to get hello-world--mainline--HEAD, where HEAD is calculated such that 1.50 is known to be farther then 1.49

  Anyways, basically, I'm trying to make the following two points:

blah--blah--blah may be convenient to type, but its hard to understand, especially because depending on the context, sometimes the first position is the user id, sometimes its the project, etc. It would make a lot of sense to make the components explicit (and update the tutorial), because it would flatten the learning curve. Tla could still accept the blah--blah--blah format as a short cut.

tla has some naming conventions in practice, but none of them are defaults in the code. By installing those naming conventions as defaults, you can also flatten the learning curve. You can also support additional features for those defaults. For instance, one of the annoying thing for me about learning tla was that its made of lots of low-level operations so I have to translate my high level "what I'm doing" into a whole set of tla commands. Something like:

 tla branchstart --task "fix_for_bug" --master master_repository
--- this starts a branch off of a remote repository, with branch name fix_for_bug, version 1.0.
 tla branchupdate
      --- grabs HEAD changes from master
 tla commit --local
      --- commits changes to branch locally
 tla commit
      --- uploads changes to remote master
 tla branchdone
      --- merges changes back to mainline in the remote master

Would be much easier to understand. In fact, in general, I'd like to see all the low-level commands in tla supplanted by high-level commands based on the use cases.

 Something I'd also like to see that I implied above:

   --local commits to the local repository.
--remote commits to both the local and remote repository. While tla doesn't currently have any concept of a "master" repository, I think it makes sense that the high-level commands would support this concept that you have local archives you can commit to all the time, with a remote archive you commit to less often.


Comments appreciated. I'm getting this list in digest mode so if your comment is "urgent" email me directly.

pierce





reply via email to

[Prev in Thread] Current Thread [Next in Thread]