[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gcl-devel] hashing
From: |
Camm Maguire |
Subject: |
[Gcl-devel] hashing |
Date: |
Thu, 14 Nov 2013 14:05:57 -0500 |
Greetings, and thanks for this informative post!
Henry Baker writes
> The best hashing performance requires deep understanding of the
> virtual memory address translation mechanism (aka translation
> lookaside buffer "TLB"). This is because the translation of
> virtual memory addresses to real memory addresses _already
> uses a builtin hardware hash table_. You might as well take
> advantage of this hardware help when accessing large hash
> tables.
>
> http://en.wikipedia.org/wiki/Translation_lookaside_buffer
>
> http://en.wikipedia.org/wiki/Virtual_memory
>
> Hashing performance is also dependent upon the cache line size.
>
> It is sometimes possible to get better performance by using
> several levels of tables, if you can be assured that the top
> levels are essentially always in the cache and in the TLB.
>
> Also, overflows can be very fast _so long as the overflow
> scan occurs within the same cache line_.
>
> I presume that there are academic papers that address (!) this
> particular subject, but if not, a couple of days worth of
> experiments might be very educational.
Will review these -- thanks! Already learned a lot from the wikipedia
page on hashing.
>
> Common Lisp's SETF mechanism for installing items into a hash
> table has always been at odds with the best possible performance
> for hash tables in which a large % of the items will _never_ be
> accessed -- i.e., a _write-mostly_ hash table. The reason is
> that what you really want is an insert-if-not-already-there
> function which performs only one hashing operation.
>
Can a conforming implementation add an optional argument to gethash:
(defun gethash (key table &optional default write-default) ...)
? This is exactly what I always run into as well. The version in
master allows you to retrieve the bucket and do the same if empty, which
is more tedious, but at least assuredly compliant.
Stated more broadly, is there any point to extending the standard in
this way, since it will still not be portable and therefore of limited
use?
> Also, Common Lisp severely crippled their built-in hash tables
> by not allowing full generality with a user-specified equality
> operator. As a result, I typically use Common Lisp hash tables
> for prototyping, but almost invariably have to write my own routines
> for the highest possible performance.
>
I was also thinking of doing this, like a user defined stream. Its not
hard, but it again won't be 'standard'.
> It would also behoove Common Lisp implementors to allow the
> _inlining_ of hash table lookup & store operations. This would
> remove one more performance penalty when using the CL standard
> hash table operations.
We've peeled away layers of the function call over the years. The
version in master makes heavy use of inlining, so this is easy.
However, inlining can easily get out of hand without some well defined
logic governing when to do it and how to cache the results. In
particular, it is very unlikely that the compiler will be able to
discern the hash table function, in which case inlining would
(currently) write four loops, one of which is selected at runtime. (Its
of course easy to do one loop, but we want performance here and don't
want to have to branch or call inside the loop.) Isn't this too much?
Take care,
--
Camm Maguire address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah