chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Re: Better algorithm for growing hash tables


From: Alejandro Forero Cuervo
Subject: Re: [Chicken-users] Re: Better algorithm for growing hash tables
Date: Wed, 3 Aug 2005 00:42:30 -0500
User-agent: Mutt/1.5.9i

> Is there a paper or book that offers a convincing, empirical argument  
> for this? I've read and heard this exhortation before but the  
> justification in the presence well designed hash and rehash functions  
> has always been "just to be safe."

Well, I don't know of any papers or books, but I believe the advantage
of using prime numbers is merely that it makes it less likely to
introduce collisions when the hashing function isn't very good.  Here
"good" means that for the set of objects you'll hash, lets call it A,
there is no tendency in their hash values to have common divisors, at
least not bigger than in sets of random numbers.  Another way to put
it is that around 1/nth of the elements of hash(A) are divisible by n
for every n < |hash(A)|/2.  Note that whether a hash table is "good"
might depend heavily on the set A you're interested in.

If many elements in hash(A) are divisible by a given number n, the
size of the table is m and the greatest common divisor of n and m is
o, most of your objects will land in the o * i portions of the table
for 0 < i < m/o, so you'll introduce a lot of collisions.

When, on the other hand, n is prime, o is always 1 so you won't need
to worry about this (you'll just need to worry that no single value
from hash(A) occurs more often than others).  When using prime sizes,
even if your hashing value produces many multiples of a given number,
as long as they are distributed, they'll be evenly distributed over
the hash.

Of course, if your hashing function is good (in the above sense) for
all your objects, you're probably safe using non-prime sizes.  But
since this might depend a lot on the particular set of objects you're
hashing, its better to be safe and use prime sizes.  I can't think of
a reason not to, anyway.

I think long long years ago, when I first learned what hash tables
were, I made some benchmarks of hashing tables with prime and non
prime sizes.  If I recall correctly, I used relatively good hash
functions and I got a slightly better performance when I used prime
sizes (say 85% faster).  But this was long ago so if you're
interested, you should really repeat my tests.

I hope I didn't make any obvious mistakes here.  Its late and I'm half
sleep already!

Alejo.
http://azul.freaks-unidos.net/

---=(  Comunidad de Usuarios de Software Libre en Colombia  )=---
---=(  http://bachue.com/colibri )=--=( address@hidden  )=---

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]