[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: case-insensitive hash of strings
From: |
Bruno Haible |
Subject: |
Re: case-insensitive hash of strings |
Date: |
Tue, 21 Aug 2007 22:59:01 +0200 |
User-agent: |
KMail/1.5.4 |
Eric,
> A couple of questions. First, in hash-pjw.c, should we be using unsigned
> char instead of char to iterate through the NUL-terminated string?
I believe it should usually have no effect on the average number of
collisions (= average length of a non-empty hash bucket), but I would be
more comfortable with this change if you could post some concrete figures.
I would assume that the gcc-generated machine code for both cases is equally
fast.
> Second, would it be worth adding a case-insensitive version of hash_pjw,
> so that strings can be hashed to the same value regardless of their case?
> It only makes sense for single-byte locales, but that's all the more that
> hash_pjw accommodates at the moment.
The majority of locales in use nowadays are multibyte locales (UTF-8,
GB18030 and EUC-*). Therefore I would concentrate on a solution that works
for both kinds of locales.
Bruno