bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: modules for generic unordered sets and mappings


From: Ben Pfaff
Subject: Re: RFC: modules for generic unordered sets and mappings
Date: Thu, 01 Jul 2010 20:25:14 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Paul Eggert <address@hidden> writes:

> I needed three kinds of hash tables. [...]

PSPP has a simple kind of hash table that might suit your
purposes.  Some people like its approach; others don't.

Here it is:
        http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.h
        http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/hmap.c
        http://git.savannah.gnu.org/cgit/pspp.git/tree/tests/libpspp/hmap-test.c
Here's the explanation taken from hmap.h above, to save
dereferencing the URLs:

/* Hash table with separate chaining.

   This header (hmap.h) supplies an "embedded" implementation of
   a hash table that uses linked lists to resolve collisions
   ("separate chaining").  Its companion header (hmapx.h)
   supplies a "external" implementation that is otherwise
   similar.  The two variants are described briefly here.  The
   embedded variant, for which this is the header, is described
   in slightly more detail below.  Each function also has a
   detailed usage comment at its point of definition.  (Many of
   those definitions are inline in this file, because they are so
   simple.  Others are in hmap.c.)

   The "hmap" embedded hash table implementation puts the hash
   table node (which includes the linked list used for resolving
   collisions) within the data structure that the hash table
   contains.  This makes allocation efficient, in space and time,
   because no additional call into an allocator is needed to
   obtain memory for the hash table node.  It also makes it easy
   to find the hash table node associated with a given object.
   However, it's difficult to include a given object in an
   arbitrary number of hash tables.

   The "hmapx" external hash table implementation allocates hash
   table nodes separately from the objects in the hash table.
   Inserting and removing hash table elements requires dynamic
   allocation, so it is normally slower and takes more memory
   than the embedded implementation.  It also requires searching
   the table to find the node associated with a given object.
   However, it's easy to include a given object in an arbitrary
   number of hash tables.  It's also possible to create an
   external hash table without adding a member to the data
   structure that the hash table contains. */

/* Embedded hash table with separate chaining.

   To create an embedded hash table, declare an instance of
   struct hmap, then initialize it with hmap_init():
     struct hmap map;
     hmap_init (&map);
   or, alternatively:
     struct hmap map = HMAP_INITIALIZER (map);
   
   Each node in the hash table, presumably a structure type, must
   include a struct hmap_node member.  Here's an example:
     struct foo
       {
         struct hmap_node node;   // hmap_node member.
         const char *string;      // Another member.
       };
   The hash table functions work with pointers to struct
   hmap_node.  To obtain a pointer to your structure type given a
   pointer to struct hmap_node, use the HMAP_DATA macro.

   Inserting and deleting elements is straightforward.  Use
   hmap_insert() to insert an element and hmap_delete() to delete
   an element, e.g.:
     struct foo my_foo;
     my_foo.string = "My string";
     hmap_insert (&map, &my_foo.node, hsh_hash_string (my_foo.string));
     ...
     hmap_delete (&map, &my_foo.node);
   You must pass the element's hash value as one of
   hmap_insert()'s arguments.  The hash table saves this hash
   value for use later to speed searches and to rehash as the
   hash table grows.

   hmap_insert() does not check whether the newly inserted
   element duplicates an element already in the hash table.  The
   client is responsible for doing so, if this is desirable.

   The hash table does not provide a direct way to search for an
   existing element.  Instead, it provides the means to iterate
   over all the elements in the hash table with a given hash
   value.  It is easy to compose a search function from such a
   building block.  For example:
     const struct foo *
     find_foo (const struct hmap *map, const char *name)
     {
       const struct foo *foo;
       size_t hash;

       hash = hsh_hash_string (name);
       HMAP_FOR_EACH_WITH_HASH (foo, struct foo, node, hash, map)
         if (!strcmp (foo->name, name))
           break;
       return foo;
     }

   Here is how to iterate through the elements currently in the
   hash table:
     struct foo *foo;
     HMAP_FOR_EACH (foo, struct foo, node, &map)
       {
         ...do something with foo...
       }
   */

-- 
"Premature optimization is the root of all evil."
--D. E. Knuth, "Structured Programming with go to Statements"




reply via email to

[Prev in Thread] Current Thread [Next in Thread]