bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gperf-bugs] Gperf 2.7.2 test file questions


From: Bruce Lilly
Subject: Re: [Gperf-bugs] Gperf 2.7.2 test file questions
Date: Thu, 21 Nov 2002 02:11:32 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2a) Gecko/20020910

Bruno Haible wrote:
On Friday 22 February 2002 23:41, Bruce Lilly wrote:

tests/c-parse.gperf uses "1,3,$"; "2,6,$" would
be better and "1,2,6" would be better still.


How do you measure which is better? gperf-2.8 will use "2,4,5" in this
case by default, and "2,4,5" is just as good as "1,2,6".

Consider computation of hash values for a set of keys having lengths
of 1, 2, ... 6, 7 characters (it does not matter whether or not the
key length is also used in computing the hash value for this
explanation).  The gperf-generated hash function uses a form of Duff's
device, i.e. a switch statement with fallthrough where the starting
point is determined by the key length. For the key positions
used, the corresponding character of each key of that length or
longer is used in the hash computation.  In general, key position
sets using higher key positions are slightly more efficient since
the high positions are ignored for short keys.

For key positions 2,4,5 a key with length 1 uses no characters from
the key, a key of length 2 uses only the second character, ... a key
of length 5 or longer uses all three key positions.

Key position set 1,2,6 will always use the first character of the key.
For a key of length 5 only positions 1 and 2 of the key are used,
compared to 3 positions used for set 2,4,5.

Comparing the two key position sets, it should be clear that 2,4,5
will be better for keys with length 3 or less, that there is no
advantage of either set for keys of length 4 or of length 6 or greater,
and that 1,2,6 is more efficient only for keys of length 5.  For a C
parser, therefore, I'd expect a somewhat better performance for
key position set 2,4,5 (since it is more efficient for key lengths
1, 2, and 3 characters, and short identifiers are common (considering
keys which are not keywords) and there are a number of 2- and 3- letter
keywords) than for 1,2,6 (which is advantageous only for key length 5).

$ is another case. A key position set of 1,2,6 is likely to be better
than 1,3,$ as the latter will always include the last character of the
key, whereas the former will use fewer key positions for keys of length
4 and 5. Moreover, $ means using key[len-1] which is slightly more
computation than key[1], so even though the same two key positions are used
in either case for a key of length 2, the one which does not use $ will
be slightly more efficient. Likewise for a key of length 6 or greater.

In summary, all else being equal, high key positions are preferable to
low ones (short keys require less computation) and specific key positions
are preferable to $.  That's why (IIRC) the selection of keys proceeds
from highest to lowest (or equivalently, low positions are preferentially
discarded).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]