lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Unexpected difference in git sha1sums


From: Greg Chicares
Subject: Re: [lmi] Unexpected difference in git sha1sums
Date: Thu, 3 Mar 2016 23:07:43 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0

On 2016-03-03 21:28, Vadim Zeitlin wrote:
> On Thu, 3 Mar 2016 21:00:13 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> If I have two git repositories (one Cygwin, one GNU/Linux) that have
> GC> the same HEAD sha1sum, and I run 'git am' to apply the same patch to
> GC> each (verified by identical md5sums of the patch file), then shouldn't
> GC> I have the same commit sha1sum and the same new HEAD sha1sum on both?
> 
>  Sorry, I think that I had myself mistakenly said that it should be the
> same in the past, but this is not the case because "git am" uses the time
> of the commit as one of the inputs to generate the hash and the time is
> different in the two cases.

I am especially grateful for the near-instantaneity of this reply because I
was about to go down the laborious and, as it would have turned out, hopeless
path of starting with bitwise identical golden repos on both systems.

Let me explain my motivation in case I haven't already. We have proprietary
data that cannot reside on any server. We want to stop mailing entire svn
repositories to each other, because a one-line change in a single file
means a new 5MB email attachment and that amplification factor is too high.
Thus, we want git repositories on different machines that are updated with
identical patchsets and always provably identical. All the hard work has
already been done in git; we just need a precise set of commands that
always preserve that perfect identity.

It doesn't matter if the commands are lengthy or abstruse, because we'll
just cut and paste them from a "cheat sheet". It's worthwhile to spend a
lot of time perfecting and documenting the tiny set of commands we'll use,
because the only use case is trading patches with each other.

>  You could get rid of this difference by using the same, fixed timestamp
> for both commits, e.g.
> 
>       $ GIT_COMMITTER_DATE=1457040000 git am ...
> 
> (you have, of course, recognized this timestamp as 2016-03-03T21:20:00Z
> but Git is so user-friendly that it even allows you to use other formats:
> https://www.kernel.org/pub/software/scm/git/docs/git-commit.html#_date_formats)

Wow, they even accept ISO 8601, the One True Format (other than JDN).
(They didn't quite get it right until
  https://github.com/git/git/commit/466fb6742d7fb7d3e6994b2d0d8db83a8786ebcf
but I guess that's a good argument to upgrade to git-2.2.0+ someday.)

It would be nice if they respected the One True English Locale, but...

$LC_TIME=en_DK.UTF-8 git log -1
commit 64d5f83da4aa2cd3884c1c08949d31cb5d8777a4
Author: Gregory W. Chicares <address@hidden>
Date:   Thu Mar 3 01:57:18 2016 +0000

But the *nix timestamp is 1970-01-01T00:00:00Z, so it's TZ-independent,
and as long as that's what's used in the hash, we're okay except that my
32-bit Cygwin installation will break a few days after my eighty-second
birthday.

>  I don't know if this is really going to be convenient for you as you will
> have to be careful to use the write (or at least the same) timestamp every
> time and if you make a mistake, the command will still work -- but the
> repositories would diverge. Maybe "git bundle" is a better solution,
> finally (it's definitely a more compact one, especially for the XML files).

Interesting. I hadn't thought of that. Once I've got a "cheat sheet"
that works nicely between my [possibly virtual] machines, I'll expose
it here and solicit ideas for improvement.

>  But wait, I discovered something I didn't know about Git just now: "git
> am" has a --committer-date-is-author-date option which, I think, should
> also work and create the same commit SHA-1 sums. It seems to work too in my
> testing and is less error-prone than specifying GIT_COMMITTER_DATE manually,
> so finally I think this is the simplest solution.

Now the expected hash-tree invariant holds across operating systems
and git versions:

b5f2b3d4cfd179bbe4ac433d5bbe86d922bec2dc
 + patch ==
38e9191736ae226fe6b7582626b42e4d308656f9

[Of course these are just throwaway copies, and I don't propose to
do actual work the way I did the test below.]

$uname
Linux
$git checkout HEAD^
You are in 'detached HEAD' state. [...wincing...]
$git rev-parse --verify HEAD
b5f2b3d4cfd179bbe4ac433d5bbe86d922bec2dc
$git am --committer-date-is-author-date ../../0001-Update-state-approvals.patch
Applying: Update state approvals
$git rev-parse --verify HEAD
38e9191736ae226fe6b7582626b42e4d308656f9

$uname

CYGWIN_NT-5.1
$git checkout HEAD^
$git rev-parse --verify HEAD

b5f2b3d4cfd179bbe4ac433d5bbe86d922bec2dc
$git am --committer-date-is-author-date ../../0001-Update-state-approvals.patch

Applying: Update state approvals

$git rev-parse --verify HEAD

38e9191736ae226fe6b7582626b42e4d308656f9





reply via email to

[Prev in Thread] Current Thread [Next in Thread]