rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Re: more info on 25gig files


From: rsync2eran
Subject: [rdiff-backup-users] Re: more info on 25gig files
Date: Thu, 07 Jul 2005 00:18:29 +0300
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050513 Thunderbird/1.0.2 Fedora/1.0.2-6 Mnenhy/0.7.2.0

Hi,

On 06/07/05 23:35, Donovan Baarda wrote:

> Yeah, but switching to OpenSSL's md4 sum will make it faster, without
> making it any weaker.

OpenSSL's MD4 implementation is actually 1.30 times *slower* than its
MD5 implementation on my machine.


>>>Though it someone can demonstrate that a different hash has a
>>>significantly better distribution over random data than md4, then maybe
>>>it would be worth considering (ie, it would avoid accidental collisions
>>>better).
>>
>>I find this unlikely.
> 
> The crucial point is that for librsync, the important part of the hash
> used is it's distribution, not it's "strength" (BTW, for those wanting
> to convince me otherwise, I think the two are actually related).

Hash functions are usually evaluated rather extensively for their
distribution properties (in effect, and without much formal basis,
they're expected to be good pseudorandom generators). Had there been a
bias large enough to affect rsync noticably, it wouldn't have been
unnoticed.


> I wasn't that concerned about exotic cryptanalytic attacks, more about
> random whole-file checksum collisions... particularly given the fact
> that we are trying to use the whole-file checksum as a way of detecting
> blocksum collisions... if the whole-file checksum is actually a checksum
> of those blocksums...

But the metahash uses the *untruncated* hashes of the blocks, where each
side computes them from the actual data it reads or writes.

> and that the savings are worth it.

The sender's computational overhead for a straightforward full-file hash
on a mostly unchanged file would be around 20%-50%, depending on the
hash function. The metahash elimintates it.

  Eran






reply via email to

[Prev in Thread] Current Thread [Next in Thread]