[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Merge the "charset" branch?
From: |
Ben Pfaff |
Subject: |
Re: Merge the "charset" branch? |
Date: |
Thu, 09 Apr 2009 07:13:26 -0700 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) |
Thanks, I rebased and pushed this to master.
In case anyone was concerned about not getting a copyright
assignment on this, essentially the same code appears in GCC:
http://git.infradead.org/gcc.git?a=blob;f=libiberty/hashtab.c
John Darrington <address@hidden> writes:
> This looks fine to me.
>
> On Wed, Apr 08, 2009 at 09:59:04PM -0700, Ben Pfaff wrote:
> John Darrington <address@hidden> writes:
>
> I've been meaning to replace the PSPP hash functions for a while
> now. The FNV hash is not so great, and our implementations lack
> a "basis" or "initval" argument that can be used to combine
> hashes in a high-quality way (e.g. not XOR of their results).
>
> So I've pushed a branch for review that fixes these problems.
> It's named "hash", and here is the summary:
>
> commit b4e3275011982e29b80589bef705fc8a0a0316dd
> Author: Ben Pfaff <address@hidden>
> Date: Wed Apr 8 21:39:22 2009 -0700
>
> NPAR TESTS: Consistently order variables in summary statistics.
>
> The set of variables in the NPAR TESTS specs structure was ordered
> randomly, according to however the hash function happened to arrange
> them.
> Sort them by variable name, instead, so that they always appear in
> alphabetical order in, e.g., descriptive statistics output.
>
> The particular hash function PSPP uses now tends to order variables
> alphabetically anyhow. The next commit changes the PSPP hash
> functions,
> so fixing this in advance prevents having to update any test output.
>
> src/language/stats/npar.q | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> commit e9c717e43278364a49b68db4718cab5c9229c8fb
> Author: Ben Pfaff <address@hidden>
> Date: Wed Apr 8 21:55:31 2009 -0700
>
> Use Bob Jenkins lookup3 hash instead of FNV.
>
> The Jenkins lookup3 hash is superior to FNV in collision resistance,
> avalanching, and performance on systems that do not have fast
> multiplication. It also provides a good way to combine the result of
> a previous hashing step with the current hash, using its "basis"
> argument.
> This commit replaces the PSPP implementation of FNV with the Jenkins
> lookup3 hash and updates all the current users.
>
> In addition, John Darrington pointed out that commit dd2e61b4a
> "Make create_iconv() properly distinguish converters by name"
> unintentionally introduced gratuitous hash collisions, by causing
> all converters where tocode and fromcode were the same to hash to
> value 0, and converters where tocode and fromcode were swapped to
> hash to the same value as each other. Using the "basis" argument to
> the Jenkins hash properly, instead of just attempting to combine
> hash values with XOR, fixes this problem.
>
> src/data/attributes.c | 6 +-
> src/data/file-handle-def.c | 12 ++-
> src/data/file-name.c | 6 +-
> src/data/short-names.c | 8 +-
> src/data/value-labels.c | 8 +-
> src/data/value.c | 6 +-
> src/data/variable.c | 4 +-
> src/language/stats/autorecode.c | 4 +-
> src/language/stats/crosstabs.q | 2 +-
> src/libpspp/hash-functions.c | 196
> ++++++++++++++++++++++++++++-----------
> src/libpspp/hash-functions.h | 12 +-
> src/libpspp/i18n.c | 2 +-
> src/math/covariance-matrix.c | 11 +-
> 13 files changed, 180 insertions(+), 97 deletions(-)
>
> --
> PGP Public key ID: 1024D/2DE827B3
> fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
> See http://pgp.mit.edu or any PGP keyserver for public key.
>
>
--
Ben Pfaff
http://benpfaff.org