[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Does anyone have a better scm_string_hash ?
From: |
Ludovic Courtès |
Subject: |
Re: Does anyone have a better scm_string_hash ? |
Date: |
Fri, 14 Nov 2003 16:51:55 +0100 |
User-agent: |
Mutt/1.5.4i [Guile enabled] |
You might want to look at
http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html .
Basically, the idea there is that a hash key for string cn..c0 is:
h = (c0 + c1*37 + c2*37^2 + ...) % hash_size.
I used that and it *seems* to work quite well.
This can be written as follows:
k = 0;
while (*string)
{
k = (k * 37) + (*string);
k %= hash_size;
string++;
}
return k;
Cheers,
Ludovic.
Today, 15 minutes, 40 seconds ago, Roland Orre wrote:
> I noticed that our data mining software run very very slow with a
> new data base. I localized the problem to scm_string_hash.
>
> A hash table in this case was loaded with 14166 strings. I have
> a function which creates a reasonable sized hash table, in this
> case the hash table size was 8209.
>
> 13856 of these strings were hashed to the same index= 1067.
> 303 strings got index = 8061.
> 2 strings got the index = 754.
> 8201 entries were empty.
>
> We are running guile 1.6 but I checked the scm_string_hash from a recent
> 1.7 CVS also and the function in hash.c there is identical.
>
> I added a few of the symbols hashing to 1067 below. One can of course
> argue that the symbols in this case should be hashed as numbers.
> Anyway, does anyone have any hint or have a better string hash function?
>
> Best regards
> Roland Orre
>
>
> A few strings hashing to entry 1067 for hash table length 8209:
> "01632001" "01627301" "01626801" "01626601" "01626501" "01626401"
> "01626301" "01626101" "01626001" "01625901" "01625801" "01625701"
> "01625401" "01625301" "01625101" "01625001" "01624801" "01624601"
> "01624501" "01624401" "01624301" "01624101" "01624001" "01623901"
> "01623801" "01623701" "01623601" "01623501" "01623401" "01623201"
> "01623101" "01622901" "01622801" "01622701" "01622601" "01622401"
>
>
>
>
> _______________________________________________
> Guile-user mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/guile-user
- cmod-play 1 available + modsup.h additions, Thien-Thi Nguyen, 2003/11/13
- Re: cmod-play 1 available + modsup.h additions, Marius Vollmer, 2003/11/14
- Does anyone have a better scm_string_hash ?, Roland Orre, 2003/11/14
- Re: Does anyone have a better scm_string_hash ?,
Ludovic Courtès <=
- Re: Does anyone have a better scm_string_hash ?, Roland Orre, 2003/11/17
- Re: Does anyone have a better scm_string_hash ?, Ludovic Courtès, 2003/11/17
- Re: Does anyone have a better scm_string_hash ?, Marius Vollmer, 2003/11/17
- Re: Does anyone have a better scm_string_hash ?, Marius Vollmer, 2003/11/17
- Re: Does anyone have a better scm_string_hash ?, Marius Vollmer, 2003/11/17
- Re: Does anyone have a better scm_string_hash ?, Allister MacLeod, 2003/11/17
- Re: Does anyone have a better scm_string_hash ?, Marius Vollmer, 2003/11/17
- OT: x86 assembly timings/size (was Re: Does anyone have a better scm_string_hash ?), Allister MacLeod, 2003/11/17
- Re: OT: x86 assembly timings/size, Marius Vollmer, 2003/11/17
- Re: Does anyone have a better scm_string_hash ?, Ludovic Courtès, 2003/11/19