texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Texmacs-dev] experimental garbage collector patch


From: David Allouche
Subject: [Texmacs-dev] experimental garbage collector patch
Date: Thu, 4 Mar 2004 17:20:58 +0100
User-agent: Mutt/1.5.5.1+cvs20040105i

For people which have been wondering what I have been doing for a week,
I have some new stuff: an experimental patch to use Hans Boehm & al.
conservative garbage collector in texmacs as a replacement for the
current pointer counting memory management scheme.

The expect result of the change is to get better performance for
operations which operate on highly localized memory areas, that is the
bulk of the day to do editing operations.

As one can expect, the patch is quite heavywheight but not very much
intrusive. It also includes a number of cleanups and fixes for bugs
which have been made apparent by the use of garbage collection.

The bad news is this patch cause texmacs to use a lot more memory and
perform worse...

The immediate issue is that the partial display invalidation code 
relied on deterministic destruction of objects and this assumption is
broken by the garbage collectior. As as consequence, any modification to
the edit tree cause a full display invalidation.

There are other issues, especially with the "resource" system and many
opportunities to experiment. The full comment of the patch follows.

---- Patch comment ----------------------------------------

This patch allows building texmacs with libgc (Hans Boehm). When GC is
used, all reference counting is disabled.

Currently, GC does not seem to provide a significant gain in time
performance and can increase the memory consumption a lot. However,
there is still a lot of room for tuning and fixing internal problems.

The benchmark used during developement was starting up texmacs and
generating the full documentation. Since GC was expected to yield
performance gains in code paths exhibiting a high locality of memory
access (because of better cache behaviour and lesser gc times) this is
a very unfavorable test. Yet the CPU usage is only very slightly
increased by GC.

Increased memory usage may, or may not, be caused by the leaky nature
of conservative gc. It is natural for GC to require more memory than
pointer counting, because some slack is required to reduce the number
of collections, but it seems that texmacs _does_ leak a significant
(but bounded) amount of memory because of conservative collection.
More empirical evidence would be needed to show the leak stays
bounded.


Currently two issues are clearly identified:

  1. Partial display invalidation is broken.

     The partial display invalidation feature relies on non-trivial
     logic in the phrase_box_rep destructor. This has been worked
     around by sending an invalidate_all when the edit tree is
     changed.

     Since destruction time is (generally) not deterministic with this
     GC, this design cannot work in any sane solution way. Anyway,
     destructors should _only_ be used to release resources, the
     invalid region compution is just an awful plate of spoiled
     spaghetti. Anyway, that's what I think.

  2. The resource system is inefficient.

     The use of the garbage collector has made it possible to release
     all inaccessible resources (save for the weak pointer in the
     "instances" dictionnary). It obviously caused some previously
     unused code paths to be used, since that required a fix in the
     loading of virtual fonts. It also causes many resources to be
     loaded multiple times when generating the full documentation.

     It seems that the good performance of the pointer-counting
     implementation relied on memory leaks...


There are open areas for experimentation:

  -- Responsiveness benchmark

     The "build full manual" benchmark is instructive and helps catch
     a lot of bugs. However it gives not information on the
     interactive behavior of texmacs, which is what users really see.

     In order to assess the performance of GC compared to pointer
     counting, we need benchmarks measuring the interactive reaction
     time in realistic editing and browsing situations.

  -- Remove uneeded indirections

     The whole application use senseless indirection for all objects,
     likely causing important memory locality issues. For example, in
     most cases, it would make sense for objects to include bodies of
     compound objects (arrays, hashmaps) instead of pointers, because
     the compound object cannot have a significantly longer lifetime
     than its including object.

     Elimination of unedded indirection should focus on objects which
     are created in great numbers: basic data structures and
     typesetter objects.

     This cleanup may, or may not, need to rely on GC.

  -- Mutation hints for the collector

     Use GC_change_stubborn and GC_end_stubborn_change when the
     pointers included in an object are changed. That might improve
     the collector performance.

  -- Other design changes

     TeXmacs was designed from the ground up for pointer counting.
     This has some obvious consequences (already pointed out) but it
     may include other designs which were required for pointer
     counting but have a negative impact on GC performance.


When the outstanding issues have been fixed, and if further
experimentation demonstrates better performance of garbage collection,
texmacs could be modified to use a precise GC. Since implementing a
good GC is _really_ a non-trivial issue, a third party library should
be used.

The only precise garbage collector for C++ I have found:
http://sourceforge.net/projects/smieciuch

-- 
                                                            -- ddaa

David Allouche         | GNU TeXmacs -- Writing is a pleasure
Free software engineer |    http://www.texmacs.org
   http://ddaa.net     |    http://savannah.gnu.org/projects/texmacs
   address@hidden  |    address@hidden
TeXmacs is NOT a LaTeX front-end and is unrelated to emacs.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]