[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
From: |
Tom Lord |
Subject: |
[Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git' |
Date: |
Wed, 20 Apr 2005 14:32:29 -0700 (PDT) |
From: John A Meinel <address@hidden>
But I have a question about blobs. They are stored compressed, and
the sha checksum is for the *compressed* form. I understand this is
probably for performance reasons. I'm concerned, though, that
compression routines may not be 100% deterministic across all
platforms.
That is an *excellent* concern and I implore you to research it further
and report back.
My understanding is still superficial in that detail: I gather that
zip formats are standardized by a IETF document. I am not certain
that the spec implies deterministic output. I am not certain that
the way I'm driving `libz', so as to be compatible with Linus' code,
is the right way to do it.
Please, by all means, dig in and nail details. The goal here is
to produce the high-quality-gem version of `git' rather than the
rough-and-ready-works-for-me version.
It is desirable to checksum the compressed rather than uncompressed
blobs so that intermediate nodes in a circuit can validate blobs
without having to pay for expanding them.
Certainly just changing the compression level will
change the compressed output.
The actual implementation of `libz' is a train-wreck. It has lots
of subtle bugs. I am using the `BEST_COMPRESSION' macro to select
the compression method but I won't be surprised if you are right that
this isn't the best choice. (I'm just copying Linus in that regard,
for speed-of-impl and compatability).
Rewriting or cleaning-up libz would be another great task for someone.
One big problem in the current `libz' is that many of the types used
for various fields are chosen poorly (e.g., `unsigned long' where `size_t'
is the right answer -- that kind of thing).
Having the handle fixed at 160 bits also seems limiting. It ties the
entire archive format into exactly one hash.
Yes it does. That's a longer discussion. Note that there are only a
finite number of valid blob contents, too.
The situation admits intense mathematical analysis --- in no small part
because we pick a particular hash and address size.
BTW -- the handles are actually 192 bits. I've upwards-compatibily
generalized Linus' code to make clearer something that is muddled
in his presentation: the blob size (zip form) is part of the handle
(what I call an "address").
Also -- I have cleaned up Linus' design by making my spec robust
against the possibility of a small number of successful SHA1 forgeries.
My design *won't* withstand an attack that can turn any text into a
semantically equivalent text with a desired SHA1 sum.
I suppose as long as there is a version marker to allow new blob db
versions, and the specific compression routine parameters are well
defined. I just want to make sure that is done up front.
Separate concerns. Blobs themselves are one thing -- blob dbs another.
Also, this doesn't seem to work really well as a revlib format, it
probably makes a great archive format, but revlibs need to know the
contents so they can diff against eachother.
You'll see how it fits :-)
-t
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', (continued)
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Tom Lord, 2005/04/20
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Tom Lord, 2005/04/20
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', John A Meinel, 2005/04/20
- Re: [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Tom Lord, 2005/04/21
- Re: [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', John A Meinel, 2005/04/21
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Matthieu Moy, 2005/04/24
- Re: [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Jacob Gorm Hansen, 2005/04/25
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Matthieu Moy, 2005/04/25
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Tom Lord, 2005/04/20
- [Gnu-arch-users] Re: [GNU-arch-dev] Re: [ANNOUNCEMENT] /Arch/ embraces `git', John A Meinel, 2005/04/20
- [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git',
Tom Lord <=
- [Gnu-arch-users] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Szilard Hajba, 2005/04/21
[Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git', duchier, 2005/04/20
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git', Tomas Mraz, 2005/04/21
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git', Tom Lord, 2005/04/21
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git', Tom Lord, 2005/04/21
Message not availableRe: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git', Mark Stosberg, 2005/04/24
[Gnu-arch-users] Re: [ANNOUNCEMENT] /Arch/ embraces `git', Szilard Hajba, 2005/04/21