gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] arch lkml


From: Andrea Arcangeli
Subject: Re: [Gnu-arch-users] arch lkml
Date: Fri, 26 Sep 2003 01:16:01 +0200
User-agent: Mutt/1.4.1i

On Thu, Sep 25, 2003 at 01:04:19PM -0700, Tom Lord wrote:
> 
>     > From: Andrea Arcangeli <address@hidden>
>     > actually it seems I can use it for my tree too, after I modify it
>     > to be able to tag into a plain source tree forked with hardlinks
>     > (something I certainly couldn't do with b*tkeeper), and a way to change
>     > the patchsets internally (and a way to extract all the patches ordered
>     > with meaningful names for marcelo and other trees not based on
>     > arch).
> 
> I don't quite follow you (Andrea) there.   Perhaps you could explain further.

ok, have a look here:

        
http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.22aa1/

when a new tree is released, it's like if I change the tag and I try to
apply all the patchsets to a new codebase. Of course they will reject.

And the final tree after the change of tag, should have the same
changesets.

Now I'm not sure if it makes sense for me at all to use arch to export
my tree instead of the current method (it might, note that other kernel
people maintains their own tress on top of mine, so their "tag" would be
at the end of my patchsets, while I hook on top of Marcelo's tar.gz
releases).

But certainly I can't do that without those features.

I mean, we have patches like the o1 scheduler that will never get merged
into 2.4 mainline that I maintain in my tree over time. The critical
value in having everything maintained as patchsets, is that I can read
the code and understand it, and it's much easier to port to a new tree
instead of a single monolithic patch, or instead of more or less random
checkins.

The main problem I have with all revision control systems out there, is
that they prevent me to pick up a patch and fixup in the middle.

What I have to do is to change the base-0 to point to another more
recent kernel tarball, and then fixup all the patchsets, leaving them in
the same order, and without altering their integrity.

More important assume I find a bug in the o1 scheduler, I refuse to add
a new patch-303 at the very end of the tree to fix this bug (yeah, with
the proper comment that this is fixing a bug in patch-20 - the current
big scheduler patch where the bug was introduced)

I absolutely need to replace patch-20 with a new version that will
include the fix. Of course this mean that patch-21 could then reject,
and in turn I've to fix it up as well, and replace patch-21 with a new
version that doesn't generate the reject.

then finally I need a way to say:

        extract all patches in this directory and giveme this file that
describes the ordering, basically generating my 2.4.22aa1/ directory
starting from the patch archive.

now this last bit is the last concern of course, with some scripting
that can be most certainly done outside arch too, but the rest not, I
mean arch doesn't contemplate the idea of replacing a certain patch.

Another feature that I find critical is what cvsps provides to CVS. I
need to ask arch "show me all the patchsets 'summary _and_ unified diff
with -p, the -p is very important if you get addicted like me, this
applies to what-changed too'" related to this certain file fs/buffer.c,
starting from 2.4.22 to thead.

With cvsps this works *great*, example:

        cvsps -f fs/buffer.c -r v2_4_22 -g

This is the most important value I can find, in having the bkcvs data
available to everyone.

>     > This my proposal to change arch to be able to hook into a random tree in
>     > the filesystem (instead of a base-0 tarball), could radically change the
>     > way of doing distributed development, in term of resource savings (I bet
>     > that would be an order of magnitude better than bk too, I mean, I've
>     > lots of tarballs open anyways here, since various trees starts on top of
>     > official tar.gz packages, not bkcvs, I so I could save gigabytes of
>     > space and dcache efficiency with that feature, even across the non arch
>     > usages that are the most common to me at the moment). The hardlinks will
>     > make an huge difference and it'll be natural the way arch is designed to
>     > take the best advance of them as soon as we can hook into an unpacked
>     > tar.gz.
> 
> Again, I don't _quite_ follow you however, perhaps this is relevent:
> 
> You can use `mkpatch/dopatch' (the arch changeset tools) more or less
> independently of using arch.    `dopatch', in particular, "does the
> right thing" when patching a tree that is the clone of some other tree
> created using hardlinks.

well my idea here is slightly different.

The idea is to have the tag command not to take a arch project/version
name, but to give it a kind of "virtual" project that is really just a
directory in the filesystem.

Then you can create this new project tagged (i.e. hooked) on top of the
directory in the filesystem.

So a checkout won't run tar xzf linux-2.4.22.tar.gz but it will be smart
enough to do "cp -al 2.4.22 project-directory" (where project directory
is the parameter to the get command)

then I can commit into this project hooked into this "virtual" tree, and
at the next "get" command arch will do again cp -al and then arch will
start applying the patchsets on top of it (normally like in every other
checkout, but just without an horribly slow huge tar xzf first).

As said above the next thing I need is to change the "tag" under all the
patchsets that I created, and to be able to fixup the rejecting patchsets.

Yet anothe feature is that I must want to put a checkin not at the end
but at the top of the tree. I think this is just doable checking out the
for example --patch-10 and then doing a commit, and then merging this
new tree into the original one (and the new patch-11 will have to go
between patch-10 and patch-11 of the original tree)

Again this is all stuff I do by hand all the time with some amount of
complex python scripting but I've no log, so if something goes wrong I
can't look back. So having more automation and log at the same time
would be a bonus.

now it's possible I can kind of emulate this by hand, but some sort of
automation would be needed to make this feasible, infact I wouldn't
trade my current intelligent scripts with anything slower to use, now
I'm around down to a mean of dozen seconds to fixup a single reject. I
get dozen of rejects and addizional dozen of fixes (that I have to merge
into the original patchset, not as an additional patchset) at every
prerelease. And I can guarantee that this is the very best way to manage
an huge amount of technology, all orthogonal, in the long run on top of
a costantly moving target that very very often generate rejects.

Since these patches are so huge, and the rejects are in turn so huge, I
normally have to backout the tiny patch that went into mainline first
(this is why cvsps is absolutely crucial to me), and then I've to apply
the previous patch in my tree, and finally I apply the patch in mainline
by hand, after that I've to rediff against the mainline version so only
my changes gets into the new patch in my tree. This was a total pain to
do before we had bkcvs and cvsps. That's why cvsps is crucial to me. I
was going to write it myself after my beg for the data being open was
successful, it was my great surprise and pleasure that it already
existed ;).

I'm not really sure if it will ever make sense for me to use a real
revision control system, but before seeing arch, I excluded it without
doubt, b*tkeeper is absolutely unusable for this kind of maintainance
and merging work.

You've to understand that the way the kernel works a patch can take ages
to be merged. See the Rusty's efforts as well, they manage patches for
the very same reason. This patch is what people has to read, so if we
put it into a regular revision control system it's not possible to fixup
it anymore, and it get mixed up and lost in the noise.

the patch-1 for me isn't just a normal checkin that a developer can do 1
time per day, patch-1 for me can be a one liner bugfix, or a 10000 line
feature addition. The single patch has the _only_ value for me, not the
tree as a whole. My object is to get things merged, and to keep each
patchset as orthogonal as possible with everything else to make it easy
to get it out and pick it.

Again this is something that none revision control system I know of, can
do, not even nearly, most of them don't even contemplate a true undo.

But after seeing the design behind arch, I thought it would be for the
first time ever reasonable for me to possibly have a revision control
system helping me with this.

> out to be a subset of what you can do with arch.   Andrea is right
> with his suggestion (my paraphrase) that we've nailed that space.

that was my undersatnding after learning how it works. And that's also
why I find amusing Larry's comment that this is a terribly hard problem
to solve and that VM is a child play compared to this, I never seen and
I still see nothing hard here in terms of merging algorithms, infact I
don't see how could it better work than what arch does, and there's a
limit to the merging, the heuristc that says "yes", and arch relyes on
"patch" for it. And certainly Larry's heuristic can't be that much
better unless he also invented an artificial intelligence which is not
the case since it runs on a workstation and you need some trillion
interconnects (also software but they take memory and cpu) to build any
sort of intelligence, nothing we are able to build today in hardware,
not on a workstation at least. And personally I prefer to rely on the
'patch' heuristic that is open.

> (The syntactic triviality: conflict mark-ups that look like unified
> diffs, is in the patch-queue.   GUIs: I think we have all the
> arch-side bits lined up and now its mostly a matter of someone making
> the interface to any of the existing graphical merge tools fit
> smoothly into use.)

Yeah, I also thought the same yesterday, it lacks all sort of
userfriendly stuff, but let's forget it for now, that's low prio.

More important would be to have the >>> <<< of CVS, they're handy,
address@hidden AFIK improved 'patch' to automatically generate this sort
of stuff, instead of the not readable -c diffs. So it should be probably
integraed. That's better than .rej files in diff -c format.

>     > Then there are quite tons of issues with the on disk format, I feel the
>     > gzip slows down things, I didn't try to remove it, but that's my
>     > feeling. Those are small patches, and small files, a tar is sure a good
>     > idea, but gzip I think should be optional. 
> 
> They reduce network traffic in a natural way.   In purely local
> set-ups, we have not seen any evidence that they impact performance in
> any way worth worrying about.

Did you benchmark this? My guess is that it slowsdown a lot. A simple
`time` around the gzip invocation should clear that up. But I can't
imagine anything else taking up that much cpu. I believe they should be
compressed only if they're larger than say 50k (that's a wild guess, but
that's normally what I do with my tree, see the link above). With small
files it grows in size because it doesn't find enough repeated patterns.

>     > About the inventories I would simply go forcing the extended mode, that
>     > force the equivalent of cvs add on every file archived, this is much
>     > prefereable IMHO because in a big thing like the kernel there's lots of
>     > garbage with all sort of names, and so the commit operation could never
>     > get right which files to checkin and which not. It's much easier to
>     > consider only the files explicitly added to the repository with an
>     > 'arch' command during the commit/update/reply etc...
> 
> Do you mean `explicit' mode?   I think it can be configured to do

oh yes.

> pretty much what you want.

How? The way I understood it is that cvs commit was picking up
everything.

>     > About other misc comments I don't see:
> 
>     > 1) the difference between update/reply/merge-star
> 
> "star-merge".   

Could you elaborate the difference?

In the tutorial there's the example of update and the example of reply.
But it doesn't address the difference. Infact I feel almost like they're
an alias to the same internal functionality.

Also why can't we use star-merge all the time?


BTW, another minor detail is that I find confusing the alias for the
commands. I prefer even a bad name but only one, that's easier to learn.

yes, I don't like redundancy, I find it wasteful and confusing.

>     > I also had a problem with cvs2arch, it worked fine for a repository (not
>     > the kernel, a small one) but not for another. It happened on a directory
>     > that didn't exist in the original cvs import, then it was created over
>     > time. At the time the directory is created in cvs, cvs2arch fails and
>     > says the directory doesn't exist in the arch data directory.
> 
> I wonder if the new gateway mechanisms BM is working on provide an
> alternative worth considering.

This is the first time I heard of it ;), where can I find this?

Also I'm unsure how much time I can dedicate to play with arch, so don't
worry if you don't see a timely reply ;)

Andrea - If you prefer relying on open source software, check these links:
            rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
            http://www.cobite.com/cvsps/
            svn://svn.kernel.org/linux-2.[46]/trunk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]