[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
From: |
Tom Lord |
Subject: |
Re: [Gnu-arch-users] [PATCH] arch speedups on big trees |
Date: |
Fri, 19 Dec 2003 11:08:45 -0800 (PST) |
> From: Chris Mason <address@hidden>
> I've been playing around with a few ideas to improve arch performance on
> large source trees, mostly in the area of applying changesets, and
> creating changesets. I've got a sample archive here with 100 changesets
> on top of the linux 2.6 kernel, and vanilla arch takes a number of
> minutes to apply them all (15-30 seconds per changeset via tla replay)
> This is primarily because arch is doing an inventory of the source tree
> before each changeset, my patch changes things to inventory only the
> files touched by the changeset instead. It sends a table of the
> candidate files to the inventory funcs, and this brings the time to
> replay my 100 changesets to ~4 seconds.
Holy crap! Really?
> This is lightly tested (make test and a few others),
This is the kind of change that needs to be made very carefully.
A quick scan of the technique you used suggests to me that it is
certainly not correct. In particular, it will not work properly for
_merges_ even though it is either right or close-to-right for _exact_
patching (such as when building a revision). Note that `replay'
(normally) counts as a merge command -- only when it is invoked from
`update' would we believe a priori that it is doing exact patching.
So there's four things here:
1) needs more testing
2) minimally, the optimization needs to be only sometimes used -- only
when we know that a changeset is being applied exactly rather than
as part of a merge
3) maximally, _perhaps_ its worth trying to think about how to
generalize the hack to handle inexact patching. I'm not so
sure it is though -- you can just use `update' rather than `replay'
and meanwhile, even without generalization, the hack can speed up
(dramatically, apparently) `get' and other cases of building a
revision from changesets
4) since this appears to be a huge win, performance-wise, it might
be interesting to take a slightly different approach that would
be harder to code up but would get inexact patching right:
Instead of trying to infer "what files to inventory" from the
contents of the changeset, an alternative is to do a full inventory
once, for the first changeset, but then keep track of what parts
of the filesystem are being changed along the way.
In other words, you could achieve much the same effect by caching
directory reads and stat calls -- and accurately invalidating
cache entries as things change. apply_changeset would still be
doing what it thinks is a full inventory: but that full inventory
could often hit up the cache rather than making system calls.
The caution is that past experience with arch has shown that it's
hard to maintain such a cache accurately, and accuracy is critical.
It would make _some_ sense to (mostly) implement it deep in the
heart of VU, as a descriptor-handler layer -- but then you also
have to watch for interactions with, for example, a fork/exec of
patch.
-t
- [Gnu-arch-users] [PATCH] arch speedups on big trees, Chris Mason, 2003/12/19
- Re: [Gnu-arch-users] [PATCH] arch speedups on big trees,
Tom Lord <=
- Re: [Gnu-arch-users] [PATCH] arch speedups on big trees, Chris Mason, 2003/12/19
- Re: [Gnu-arch-users] [PATCH] arch speedups on big trees, Tom Lord, 2003/12/19
- Re: [Gnu-arch-users] [PATCH] arch speedups on big trees, Tom Lord, 2003/12/19
- Re: [Gnu-arch-users] [PATCH] arch speedups on big trees, Chris Mason, 2003/12/19
- Re: [Gnu-arch-users] [PATCH] arch speedups on big trees, Tom Lord, 2003/12/19
- Re: [Gnu-arch-users] [PATCH] arch speedups on big trees, Andrew Suffield, 2003/12/19