emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [EXPERIMENT] Emacs with the SpiderMonkey garbage collector


From: Steve Fink
Subject: Re: [EXPERIMENT] Emacs with the SpiderMonkey garbage collector
Date: Sat, 2 Dec 2017 21:24:50 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Thunderbird/57.0

On 12/2/17 4:37 PM, Pip Cet wrote:
On Fri, Dec 1, 2017 at 5:57 PM, Steve Fink <address@hidden> wrote:

The one big thing that the analysis doesn't handle particularly well is
internal pointers. For now, it sounds like you're mostly safe from these
moving because all your data is hanging off of the JS_GetPrivate
indirection, so you wouldn't have any pointers internal to GC things. But
you can still run into issues keeping things alive. We mostly have this
problem with strings, since small strings are stored inline in the GC things
and so it's very easy to end up with a char* that will move. We tend to work
around this by either (1) passing around the GC pointer instead of the
char*; (2) making functions that accept such a char* to also declare that
they won't GC by requiring a reference to an 'AutoRequireCannotGC&' token,
which is validated by the static analysis; or (3) forcing the contents to be
stored in the malloc heap if they aren't already, and just being careful
with keeping the owning JSString* in a Rooted<JSString*> somewhere higher on
the stack.

Collections are also an issue, if you want to index them by GC pointer
value.
It seems I have two options: rewrite all hashed collections whenever
something moves, or make up a hash value and store it in a private
slot for each object upon creation. My understanding is SpiderMonkey
does the former for WeakMaps, and those seem to perform okay, so that
might be the better option long-term, but I haven't given much thought
to this and the made-up hash value seems easier to implement...

We've done it two ways, gradually shifting almost everything over to the second.

The first was what we called "rekeying", which is just removing and re-inserting anything whose pointer changed (or more generally, when any pointer that formed part of the key changed.) We had to do some careful dancing in our hashtable implementation to be certain rekeying can't cause the table to grow (we have tombstones, so the naive approach accumulates tombstones.) Most things are now switching over to the second approach, which is to key tables off of unique ids. We have a separate table mapping anything that needs one to a unique id. But that's still just a space optimization; if all of your objects are going to end up needing a unique id, then you're paying the extra memory cost for everything anyway and you may as well store a hash (or unique id), as you say.

If these tables aren't holding strong references (if being a key in the table shouldn't keep the key alive), then you need to sweep them too, to throw out all of the dead stuff. Even if nothing ever looks up those keys again, hash collisions will have you comparing their dead memory with lookup keys. And if you have code to sweep the table, I guess you can always reuse it during the moving part of the collection to rekey everything that is moving. (And if they *are* holding strong references, they should be traced.)

The embedder API to hook into this can be seen at https://searchfox.org/mozilla-central/source/js/public/GCAPI.h#926

We use GC-aware data structures within spidermonkey (GCHashMap, GCHashSet, GCVector) to automate most of this. See eg https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#28 though even those still require something to call their sweep() methods. The WeakCache at https://searchfox.org/mozilla-central/source/js/public/SweepingAPI.h#54 sets itself up to be swept automatically at the right time.

And like I said, those generally use unique IDs. https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#105 is the hashtable that rekeys instead. (Note that our hashtable implementation isn't great. It uses double hashing, and probably ought to be replaced with one of the Robin Hood variants, which would change rekeying.)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]