chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Better algorithm for growing hash tables


From: Alejandro Forero Cuervo
Subject: [Chicken-users] Better algorithm for growing hash tables
Date: Thu, 28 Jul 2005 12:42:58 -0500
User-agent: Mutt/1.5.9i

Hi, Felix!

One of my programs was taking way too long to execute something that
should run relatively fast.  After some experimentation, I noticed
that working with hashes that grow to a considerable size is
unbearably slow, as the following example, hashing a million elements,
shows:

> (use srfi-1)
> (define hash (make-hash-table))
> (for-each (lambda (n) (hash-table-set! hash hash n)) (iota 1000000))

You can compile this and notice that it takes *way* too long to run.
As I write this, it's been running for over 1 1/2 hours in my
relatively fast machine.

Whenever a hash table gets full, Chicken will increase its size.  If
the size is bigger than 2521, the algorithm used to determine the new
size seems to be to add 101 to its current size.  That means that for
big hashes (as the one in the above example), a *lot* of resizing will
take place.

I am attaching a patch I made (against Chicken 2.0) that improves the
algorithm; after applying it this program runs in around 10 seconds!
That's at least (asuming my program without the patch was about to
finish when I aborted it) 540 times faster (and for larger datasets
the improvement would only grow).

Here is what I did.  I rewrote the list of predefined sizes.  It now
starts in 307 (rather than in 7 * 43) and each element is the smallest
prime at least twice as big as the previous in the list.  I did this
until I got to 650680571, after which I added 1073741823, the largest
fixnum we can represent.  To expand a hash, Chicken will now use the
smallest element in the list at least twice as large as the current
size of the hash (or the largest fixnum if this condition doesn't hold
for any element in the list).

Feel free to apply this patch to the official branch.

Alejo.
http://azul.freaks-unidos.net/

P.s.: I am also attaching the program I used to calculate the list of
predefined sizes, just in case its useful.  It runs so fast (finishing
in 0.027s in my machine) I was tempted to include it directly in
extras.scm, calculating the predefined sizes dynamically at startup.

---=(  Comunidad de Usuarios de Software Libre en Colombia  )=---
---=(  http://bachue.com/colibri )=--=( address@hidden  )=---

Attachment: patch
Description: Text document

Attachment: calc-primes.scm
Description: Text document

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]