chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Good way to code the equivalent to this?


From: Jim Ursetto
Subject: Re: [Chicken-users] Good way to code the equivalent to this?
Date: Sun, 24 Aug 2008 03:46:48 -0500

>>> My attempts all use gigs of memory and run 10x as long.

Let's take a look at why so much memory is being used, using the
following very similar code.

(define a (make-hash-table =))
(let loop ((n 250000))
  (let ((x (random 500000))
        (y (random 500000)))
    (when (> n 0)
      (hash-table-update! a x
                          (lambda (h)
                            (hash-table-set! h y #t)
                            h)
                          (cut make-hash-table eq?))
      (loop (sub1 n)))))

$ csi
#;1> ,r
Memory: heap size is 500000 bytes with 361224 bytes currently in use

We evaluate the code above and recheck the memory usage.  Remember
that half the "in use" memory is dedicated to the stop&copy GC.

#;3> ,r
Memory: heap size is 1025151236 bytes with 776804714 bytes currently in use

So, an entire gigabyte of virtual memory and about 265MB of memory
just for the hash itself (btw, my RSS states 566MB of actual memory in
use):

#;3> (- 776804714 (/ 1025151236 2))  ; heap used at end
264229096
#;4> (- 361224 250000)    ; heap used at beginning
111224
#;5> (- 264229096 111224) ; total mem used
264117872

Now we calculate the overhead actually used by the hash itself.  It
relies on knowledge of Chicken internals.

#;6> (vector-length (##sys#slot a 1))  ; number of buckets
635413
#;7> (hash-table-size a)               ; number of associations
196597
#;8> (* 196597 1280)                   ; overhead in bytes for empty
hash tables (307 buckets)
251644160
#;10> (* 250000 12)                    ; size in bytes of y elements
(12 bytes/pair)
3000000
#;11> (* 250000 12)                    ; size in bytes of x elements
(12 bytes/pair)
3000000
#;14> (length (filter null? (vector->list (##sys#slot a 1))))        ;
# of empty buckets in a
438816
#;15> (- 635413 196597)                ; equal to #14, so each bucket
contains at most 1 value
438816
#;16> (* 438816 8)                     ; overhead in bytes for empty
buckets in a
3510528
#;20> (* 196597 8)                     ; overhead in bytes for full buckets in a
1572776
#;21> (+ 251644160 3510528 3000000 3000000 1572776)  ; calculated memory usage
262727464

Our final calculation is 263MB for the hash, which is close enough for
government work.  Of this, 250MB is spent on the second-level hashes,
which take up a full 1280 bytes when empty because each has a minimum
of 307 buckets. Another 5MB is spent on bucket overhead in the
first-level array, and 6MB is spent on actual (key . value) pairs.

I found you can save a few megs by compacting the first-level array
using (define a (make-hash-table = min-load: 0.99 max-load: 0.99).
This is a complete hack, but the number of buckets dropped from
635,000 to 317,701 (i.e. the next lowest possible value).

However, nothing else I tried worked.  You can't cap the number of
buckets, you can't adjust the permissible load to 1.00 or higher, you
can't reduce the minimum size of a hash table, and trying a different
integer hashing function gave no benefit.  But I am not an expert, so
may be missing something.

In short, Chicken hash tables are not suitable for your application,
or, in my opinion, anything serious right now.  I believe this is
being worked on.  Also keep in mind that perl hashes are highly,
highly optimized as they are Perl's swiss-army data structure.

Meantime, write your own data structure that takes into consideration
the normal use of your sparse arrays.  For example, I made a dumb
change that implements the second level as a plain alist, since
collisions at the first level are rare in this particular case.
Compiled, it runs at 1.2 seconds on my machine vs 0.8 seconds for the
perl script.  The data structure is 13MB because of the savings at the
second level.  Although I couldn't solve your problem, here is the
modified example.

(print "Filling the associative array with 250000 entries.")

(define a (make-hash-table =))

(let loop ((n 250000))
  (let ((x (random 500000))
        (y (random 500000)))
    (when (> n 0)
      (hash-table-update!/default a x
                          (lambda (L)
                            (cons (cons y #t) L))
                          '())
      (loop (sub1 n)))))

(print "Reading from the associative array 10000 times\n")

(printf "Done (hits ~a)"
        (let loop ((n 10000) (hits 0))
          (let ((x (random 500000))
                (y (random 500000)))
            (if (= n 0)
                hits
                (if (and-let* ((h (hash-table-ref/default a x #f)))
                      (alist-ref y h))
                    (loop (sub1 n) (add1 hits))
                    (loop (sub1 n) hits))))))




reply via email to

[Prev in Thread] Current Thread [Next in Thread]