[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Merge the "charset" branch?
From: |
John Darrington |
Subject: |
Re: Merge the "charset" branch? |
Date: |
Thu, 9 Apr 2009 14:09:59 +0800 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
This looks fine to me.
On Wed, Apr 08, 2009 at 09:59:04PM -0700, Ben Pfaff wrote:
John Darrington <address@hidden> writes:
I've been meaning to replace the PSPP hash functions for a while
now. The FNV hash is not so great, and our implementations lack
a "basis" or "initval" argument that can be used to combine
hashes in a high-quality way (e.g. not XOR of their results).
So I've pushed a branch for review that fixes these problems.
It's named "hash", and here is the summary:
commit b4e3275011982e29b80589bef705fc8a0a0316dd
Author: Ben Pfaff <address@hidden>
Date: Wed Apr 8 21:39:22 2009 -0700
NPAR TESTS: Consistently order variables in summary statistics.
The set of variables in the NPAR TESTS specs structure was ordered
randomly, according to however the hash function happened to arrange
them.
Sort them by variable name, instead, so that they always appear in
alphabetical order in, e.g., descriptive statistics output.
The particular hash function PSPP uses now tends to order variables
alphabetically anyhow. The next commit changes the PSPP hash
functions,
so fixing this in advance prevents having to update any test output.
src/language/stats/npar.q | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
commit e9c717e43278364a49b68db4718cab5c9229c8fb
Author: Ben Pfaff <address@hidden>
Date: Wed Apr 8 21:55:31 2009 -0700
Use Bob Jenkins lookup3 hash instead of FNV.
The Jenkins lookup3 hash is superior to FNV in collision resistance,
avalanching, and performance on systems that do not have fast
multiplication. It also provides a good way to combine the result of
a previous hashing step with the current hash, using its "basis"
argument.
This commit replaces the PSPP implementation of FNV with the Jenkins
lookup3 hash and updates all the current users.
In addition, John Darrington pointed out that commit dd2e61b4a
"Make create_iconv() properly distinguish converters by name"
unintentionally introduced gratuitous hash collisions, by causing
all converters where tocode and fromcode were the same to hash to
value 0, and converters where tocode and fromcode were swapped to
hash to the same value as each other. Using the "basis" argument to
the Jenkins hash properly, instead of just attempting to combine
hash values with XOR, fixes this problem.
src/data/attributes.c | 6 +-
src/data/file-handle-def.c | 12 ++-
src/data/file-name.c | 6 +-
src/data/short-names.c | 8 +-
src/data/value-labels.c | 8 +-
src/data/value.c | 6 +-
src/data/variable.c | 4 +-
src/language/stats/autorecode.c | 4 +-
src/language/stats/crosstabs.q | 2 +-
src/libpspp/hash-functions.c | 196
++++++++++++++++++++++++++++-----------
src/libpspp/hash-functions.h | 12 +-
src/libpspp/i18n.c | 2 +-
src/math/covariance-matrix.c | 11 +-
13 files changed, 180 insertions(+), 97 deletions(-)
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature