gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] arch lkml


From: Andrea Arcangeli
Subject: [Gnu-arch-users] arch lkml
Date: Thu, 25 Sep 2003 21:21:12 +0200
User-agent: Mutt/1.4.1i

Hello everyone,

[ changing mailing list to arch-users, while I don't think this is
  totally offtopic for l-k since kernel developers are affected somehow
  by these matters, I feel we now have a better home for these
  discussions ]

On Thu, Sep 25, 2003 at 11:15:33AM -0600, Eric W. Biederman wrote:
> poorly and is not distributed.  SVN is not distributed.  ARCH is
> barely distributed and architecturally it makes distributed merging
> hard.  [..]

I learnt using arch yesterday, I didn't try the
webdev/ssl/http/ftp/mirroring options, but I covered all the basics, and
I fail to see how can another product provide a more powerful
distributed _merging_ functionality (NOTE: I'm only talking about the
distributed merging here, not performance, or whatever else unrelated to
the merging).

actually it seems I can use it for my tree too, after I modify it
to be able to tag into a plain source tree forked with hardlinks
(something I certainly couldn't do with b*tkeeper), and a way to change
the patchsets internally (and a way to extract all the patches ordered
with meaningful names for marcelo and other trees not based on arch).
After that works then people can as well hook into my tree and add more
patchsets that I can merge into mine (they will also need to provide the
same original hooking/tag tree of course, that isn't cached into any
repository but has to be in the local machine to save an incredible
amount of network and disk space and disk bandwidth resources).

the tag (I would call it an hook not a tag, but anyways) is a very
powerful concept.

This my proposal to change arch to be able to hook into a random tree in
the filesystem (instead of a base-0 tarball), could radically change the
way of doing distributed development, in term of resource savings (I bet
that would be an order of magnitude better than bk too, I mean, I've
lots of tarballs open anyways here, since various trees starts on top of
official tar.gz packages, not bkcvs, I so I could save gigabytes of
space and dcache efficiency with that feature, even across the non arch
usages that are the most common to me at the moment). The hardlinks will
make an huge difference and it'll be natural the way arch is designed to
take the best advance of them as soon as we can hook into an unpacked
tar.gz.

Of course everybody then is required as well to keep a local copy of the
"hooked" kernel somewhere, but that's fine, people will download the
tarball with tar.gz and unpack it once, without duplication for the
non-changed code.  then they can start to checkout on top of it and
merge with each other etc...

arch looks infact more an 'archive for patches' than anything else. And
it has nearly nothing to do with the cvs concept of revision control. It
definitely maintains its promise in the homepage. So the concept is very
attractive and it does exactly what I was suggesting the first time I
heard of b*tkeeper: that is we needed a distributed patch management
system  more than a revision control system. What we care are the
changesets to be readable extractable patches, we don't only care to
merge blind with each other w/o rejects. So I believe we should add this
concept to arch (i.e. the ability to modify the changeset in a way that
if extracted is an orthogonal patch, reordering between different
changesets would be nice too, both can reject after reordering)

then the revision control is reduced in keeping a log of the history of
which patches have been merged when and why.

But in terms of automated "merging" design between two separate
distributed trees ala b*tkeeper, the problem after you run into rejects
is just unsolvable better than 'arch' already does, as far as I can
tell. I don't see how b*tkeeper can do better just because I can't
imagine anything better possible to do during merging (I can't use bk
myself so I can't know if it does it differently). And not even
b*tkeeper can know that the merging went right like an human can do,
there's no way it can understand the sematics of the code during merging
(actually defined as star-merging as from the arch specifications). If
one product does better than the other given the same development
simulation, it could be simply that its heuristics are less strict (i.e.
simular to diff -u0 vs diff -u vs diff -u10)

At a certain point even b*tkeeper has to purerely use a statistical
approch on saying "yes this merging can be done and there are no
collisions". Saying "no" is easy, saying "yes" is impossible for a
software not undersatanding the kernel code like an human can do today.
In the arch case the heuristic used to say "yes", is the one in 'patch',
for b*tkeeper I can't know. 

I may very well be missing something, so any hint from anybody who knows
how bk is better than arch at distributed merging is welcome, so we can
add it too arch too and fill the gap once for all.

I guess I will defer the fixing of cvs to add the domain name to the
logs, at the moment I see it low priority compared to exporting l-k with
arch through bkcvs and extending arch with the above outlined features
that would be dramatically useful in kernel development where patches
can stay floating and maintained for years until they get merged into
mainline (just see some of the patches in 2.4.23pre4 for example, or
some in the -mm tree). And btw, I doubt b*tkeeper has those features, if
it has those, then they were never advertized in any email I read.

Then there are quite tons of issues with the on disk format, I feel the
gzip slows down things, I didn't try to remove it, but that's my
feeling. Those are small patches, and small files, a tar is sure a good
idea, but gzip I think should be optional. Probably a secondary backend
for storage of the database could improve arch dramatically. And
especially the overkill log file names force the kernel to kmalloc the
dcache names, which slows down things a bit too (at least during the
first checkout, despite I feel it's very minor overhead compared to
gzip).

another way would be to have kernel support, like an arch filesystem
where to store the data, could work on the loop device, and would solve
the inventory troubles too.

About the inventories I would simply go forcing the extended mode, that
force the equivalent of cvs add on every file archived, this is much
prefereable IMHO because in a big thing like the kernel there's lots of
garbage with all sort of names, and so the commit operation could never
get right which files to checkin and which not. It's much easier to
consider only the files explicitly added to the repository with an
'arch' command during the commit/update/reply etc...

About other misc comments I don't see:

1) the difference between update/reply/merge-star
2) why there isn't an automated make-log + vi + commit command
3) why the +/= in front of the names

About the {arch} directory, using the { is a great idea to reduce
dramatically the namespace pollution. I've never seen a { in a directory
before ;)

I also had a problem with cvs2arch, it worked fine for a repository (not
the kernel, a small one) but not for another. It happened on a directory
that didn't exist in the original cvs import, then it was created over
time. At the time the directory is created in cvs, cvs2arch fails and
says the directory doesn't exist in the arch data directory.

thanks,

Andrea - If you prefer relying on open source software, check these links:
            rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
            http://www.cobite.com/cvsps/
            svn://svn.kernel.org/linux-2.[46]/trunk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]