gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] archive storage format comments on the size


From: Andrea Arcangeli
Subject: Re: [Gnu-arch-users] archive storage format comments on the size
Date: Tue, 30 Sep 2003 02:37:58 +0200
User-agent: Mutt/1.4.1i

On Mon, Sep 29, 2003 at 07:15:55PM -0400, Miles Bader wrote:
> On Tue, Sep 30, 2003 at 12:50:55AM +0200, Andrea Arcangeli wrote:
> > I think the ideal is to have a way to tell "I don't think I need to access
> > between base-0 and patch-2000 for a very long time", the tla can
> > transparently ungzip all the tar.gz patchsets, and create a superpatchset,
> > that is the .tar.gz of the directory base-0 - patch-2000, of the
> > uncompressed patchesets. That will be able to crunch the 30M into 2M of
> > space, without losing anything, and it'll still work trasparently the few
> > times you need to access below patch-2000 to look back.
> 
> Note that there's already discussion of a similar feature, `summary deltas'
> going on in another thread; the main goal there is to provide a way to reduce
> the overhead of applying lots of changesets (without using cachedrevs, which
> are often unsuitable for very large trees).
> 
> The thread Subject: line is `situations where cached revisions are not so
> good'.

thanks, I had a very short look. Correct if I misunderstood, but I feel
it's quite an orthogonal problem and I rate the superpatchset more
worthwhile in practice.

Infact it seems quite the opposite idea, the summary will take more
space or/and it will lose information as far as I can tell, but they
boost the checkout.

The whole object of the superpatchset is instead:

1) don't lose a single granulart patchset of information
2) dramatically reduce the amount of data to transfer to reach
   the *full* information (if you really still want the summaries
   you really want to still fetch the superpatch first, and to recreate
   the summaries on your local machine later, so you also get all
   the info) You've seen the size of the superpatch for >2000 patches,
   it's a x20 compression and it zeros out all the tons of RTT delays
   that you'd still suffer to with the summary
3) dramatically reduce the (compressed) size of the archive
4) provide stronger encryptions like bzip2 -9
5) be completely trasparent for everything but merging
6) requires only an hint on the range of revisions that should compose
   the superpatch (and the directory where to unpack the temporary file,
   i.e. /dev/shm if you've enough ram)
7) improve the checkout too because all the patchsets will be unpacked
   at once in the core-cpu-speed /dev/shm

the point 7 is just a side effect, not the real object to achieve, of
course it will get an huge benefit too, I can only imagine how fast tla
will be after it finds all the 2000 patchsets stored unpacked in
/dev/shm. Currently I can almost read what it's writing when it
checkouts, while it should eventually print stars like 10% 20% 30%
etc... since it'll be so fast you can't read it anyways, at least that's
my hope ;)

Of course when you run get-patch patch-1 it'll simply tar xjf
superpatchset.tar.bz2 patch-1, so it'll unpack only one file, all
transparently for all operations, except a merging in the middle of the
superpatch.

Andrea - If you prefer relying on open source software, check these links:
            rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
            http://www.cobite.com/cvsps/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]