[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'.
From: |
Ben Escoto |
Subject: |
Re: [rdiff-backup-users] Interesting write-up of 'compare-by-hash'. |
Date: |
Fri, 30 Jan 2004 14:07:45 -0800 |
>>>>> Greg Freemyer <address@hidden>
>>>>> wrote the following on Wed, 28 Jan 2004 14:08:05 -0500
> Since the hashing process is lossy (ie. non-reversable), then it is
> possible that two totally different data sets could generate the same
> hash, and in turn invalidate the backup checksum check.
>
> I don't know what the odds are of this happening with rdiff-backup.
>
> I assume that they are exceedingly small, but not zero.
For rdiff (and thus rdiff-backup) it actually depends on the number of
blocks in the file, because there is no global sha1 or md5 hash. So a
2GB file that has all 2GB changed is more likely to cause a hash
collision than a changed 1k file.
I remember asking Donovan Baarda about this on the librsync list a while
ago, so if anyone is curious for more details they can look that up. The
upshot IIRC is that (for "random data") the odds of a collision are
around 2^-50 even for fairly large files. This isn't as good as a 128
global hash, but quite reasonable for practical use.
--
Ben Escoto
pgp1rK_M1R1tE.pgp
Description: PGP signature