pika-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pika-dev] hashq values and weak references (was hackerlab fsplay.[ch] s


From: Tom Lord
Subject: [Pika-dev] hashq values and weak references (was hackerlab fsplay.[ch] status)
Date: Sat, 8 May 2004 11:31:41 -0700 (PDT)

    > From: Andreas Rottmann <address@hidden>

    > Andreas Rottmann <address@hidden> writes:

    > > Since you seem quite opposed to the memchunk idea, I took another look
    > > at the object data areas, and it seems instead of having the data in a
    > > separate memchunk, we might as well keep it in the hashtree items,
    > > thus avoiding the memchunk issue.

    > At the additional cost that the memory in object data area must be
    > copied, but I guess that's neglectable unless you stick large stuff in
    > there (would we?).

I'm a little confused as to what you're talking about.

I'm assuming that you're working solving the hashq problem: that with
our copying collector an address-based HASHQ is no good -- the address
of an object can change over time.

Here is how I think we should solve that problem:

Pick some size for "GC-pages".  The GC'ed heap is divided into
GC-pages, each page divided up into objects of a certain size.  

Reserve one of those objects at the start of each page for a hash
bucket.   So it's (initially, I'll change this further down):


        GC page:

        |  hash-bucket |  obj | obj | obj | ... | obj |
                |
                V
              | obj . hashq | obj . hashq | .... |

The hash-bucket only needs to exist if HASHQ has been called for any
of the objects on a page.

During copy collection, have _two_ to-space pages for each size of
object:  one for objects with hash values, another for objects without
hash values.   That way, after a copy collection, the number of hash
buckets will be minimized.

Also note that, after copy collection, the order of objects in the
hash bucket is sorted by address (because we fill up to-space in
address order).  HASHQ can preserve that sorting when adding new hash
values to a bucket.  If we want to make hash buckets as small as
possible then a lookup in one could be a binary search.

We can do better than a binary search if we don't mind making hash
buckets larger.  So, let's make it (I'll change this again further
down):

        GC page:

        |  hash-bucket |  obj1 | obj2 | obj3 | ... | objN |
                |
                V
              | hashq1 | hashq2 | hash3 | .... | hashqN |

where the number of bucket slots equals the number of objects on the
page.   Now HASHQ is, once again, an O(1) operation.

That space consumption is not so bad at all.   Many scheme
implementations these days add a hash-value slot to every object
(e.g., a cons-pair has storage for CAR, CDR, and HASHQ-VALUE (and
usually something else, too).   We'll be using less space than those
implementations and getting very close to the same speed.

Finally, I think we can safely assume that less than 1/2 of all
objects will ever be hashq'ed.  Plus, on every copy collection, we'll
be compacting the hash buckets.  Therefore we can safely double the
size of our hash bucket entries and still come out ahead of other
implementations, space-wise.

So, let's make it:

        GC page:

        |  hash-bucket |  obj1 | obj2 | obj3 | ... | objN |
                |
                V
              | next-bucket | hashq1 . weak1 | hashq2 . weak2 | .... 

The WEAK slots contain either NIL or a reference to weak-reference.
A weak reference is a limbless vtable object:

        | vtable . obj |

During copy-collection, when copying a hashed object with a non-nil
WEAK, copy the reference to the weak-reference and also schedule the
weak-reference for copying.   Set the from-space weak-reference link
in the hash bucket to nil.

To illustrate, suppose we start with six objects just before a
copy-collection:

           will survive   has been   has a weak
           collection?    HASHQed?   reference to
                                     it?


   OBJ1         no        no         no
   OBJ2         no        yes        no
   OBJ3         no        yes        yes
   OBJ4         yes       no         no
   OBJ5         yes       yes        no
   OBJ6         yes       yes        yes

(Note that there will never need be any such thing as an object that
has a weak reference but that has not be HASHQed.)

Prior to collection, the heap looks like this (writing pages
vertically instead of horizontally here)


                BEFORE COPYING


        GC-page      hash bucket     weak references
                    (hashq . weak)
        -----------------------------------------------

        OBJ1        ( 0 . nil ) 
        OBJ2        ( HASH2 . nil )
        OBJ3        ( HASH3 .  o--)---> ( vtable . OBJ3 )
        OBJ4        ( 0 . nil )
        OBJ5        ( HASH5 . nil )
        OBJ6        ( HASH6 .  o--)---> ( vtable . OBJ6 )


After copying, the live objects will be spread over (at least)
two pages, like this:

                POST-COPYING, UN-HASHED OBJECTS PAGE


        OBJ4'        n/a


                POST-COPYING, HASHED OBJECTS PAGE

        OBJ5'       ( HASH5 . nil )
        OBJ6'       ( HASH6 . o--)---> ( vtable . OBJ6' )


and from space is left looking like this:

                POST-COPYING, FROM-SPACE PAGE

        OBJ1        ( 0 . nil ) 
        OBJ2        ( HASH2 . nil )
        OBJ3        ( HASH3 .  o--)---> ( vtable . OBJ3 )
        BH4         ( 0 . nil )
        BH5         ( HASH5 . nil )
        BH6         ( HASH6 . nil )

        ;; "BH<n>" is "broken heart for OBJ<n>"
q

So there's one last step added to copying -- a scan of all from-space
hash buckets.   After that scan, we get:

                POST-COPYING, POST-HASH-BUCKET-SCAN
                FROM-SPACE PAGE

        OBJ1        ( 0 . nil ) 
        OBJ2        ( HASH2 . nil )
        OBJ3        ( HASH3 .  o--)---> ( vtable . #dead )
        BH4         ( 0 . nil )
        BH5         ( HASH5 . nil )
        BH6         ( HASH6 .  nil )

        ;; "BH<n>" is "broken heart for OBJ<n>"


And, voila, we have pretty inexpensive weak references and
space-efficient O(1) hashq.

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]