monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability and Windows p


From: Peter Simons
Subject: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Date: 18 Jan 2004 18:27:54 +0100

Asger Kunuk Alstrup writes:

 > Does it need to be injective? I.e. should different data
 > with reasonable probability result in different ids?

 > Does it need to be surjective? I.e. based on an id, you
 > should with reasonable probability be able to find the
 > data it corresponds to?

Please pardon my nit-picking, but since you asked, I
figured, I might as well answer. :-)

A hash function cannot possibly be injective, because it
maps from an infinite set (all possible permutations of
input data) to a finite set (all possible hash values). An
injective mapping can only occur between sets of equal
magnitude.

Also, you seem to confuse a surjective function with an
_invertible_ one. Surjectivity means that the image set of a
mapping is equal to the set it's mapping to. In other words:
All elements of the image domain are actually "used" by the
mapping. This property is very desirable for hash functions,
but I don't think any of the current hash algorithms has
been proven to be surjective.

Just for kicks, here are the formal definitions: Any
function f, which maps from D to M, is injective if for any
x and x' in D the property

   f(x) = f(x')  ==>  x = x'.

holds. f is surjective if for any y in M there exists x in D
so that f(x) = y.

Finally, a function that is injective _and_ surjective is
called "bijective", and all bijective functions are
invertible. Being invertible would be a really, really BAD
property for a hash function.

To actually answer your question, though, -- in the hope of
making free-time for Graydon so that he can fix the bug
that's keeping me from re-building my depots *grin* -- the
desired property for cryptographic hash functions is that
they are non-linear. A good design principle is that if one
bit in the input changes, 50% of the bits in the output
change.

This property makes it very unlikely that similar inputs
collide. What "similar inputs" are, depends, of course, on
your application. As far as I know, there are hash
algorithms specifically designed for video and audio
streams, but I might be wrong since this is not definitely
not my area of expertise.

Hope to help,

Peter





reply via email to

[Prev in Thread] Current Thread [Next in Thread]