emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: size of emacs executable after unicode merge


From: Dan Nicolaescu
Subject: Re: size of emacs executable after unicode merge
Date: Fri, 31 Oct 2008 08:07:14 -0700 (PDT)

Kenichi Handa <address@hidden> writes:

  > In article <address@hidden>, "Richard M. Stallman" <address@hidden> writes:
  > 
  > >     If I comment the load_charset_map_from_file call in unify_charset the
  > >     data segment size is back to normal.
  > 
  > > Although these are loaded "on demand", perhaps something "demands" them
  > > at build time.
  > 
  > It's not that simple.  This is the strategy of the charset
  > map loading mechanism.  I took that approach expecting that
  > char-tables that are garbage-collected before dumping are
  > not in the dumped file.
  > 
  > (0) At first, Emacs assigns a unique linear character code
  >     space in upper Unicode area (#x110000-) to each big
  >     character set (e.g. GB, JIS, KSC) (*see the note at the
  >     tail).  The decoding of a character of a specific
  >     charset into this area is quite fast (done just by a few
  >     steps of arithmetic calculation).  Encoding is the same
  >     too.
  > 
  > (1) While building Emacs, when unify-charset is called, we
  >     update two char-tables Vchar_unify_table, and
  >     Vchar_unified_charset_table.  The former maps a
  >     character in the above upper area to Unicode area, and
  >     the latter maps the character to charset symbol.
  >     Unify-charset also builds deunifier char-table for each
  >     charater set that maps a character in Unicode area to
  >     the upper area that is unique to each charset.
  > 
  >     So at this time, the full maps is build.
  > 
  > (2) Just before dumping, clear-charset-maps is called.  This
  >     function sets all char-tables built in (1) (except for
  >     Vchar_unified_charset_table) to nil.  Then set
  >     Vchar_unify_table to Vchar_unified_charset_table, and
  >     set Vchar_unified_charset_table to nil.
  >
  >     Then, garbage-collect is called.  After that, the living
  >     char-table is Vchar_unify_table only, and the contents
  >     is not that big because it maps upper area characters to
  >     charset, and each charset has linear upper area, thus
  >     most succeeding charaters have the same value.

To allow the allocator can release pages back to the system after they
being garbage collected, you have to be sure that absolutely ALL the
data allocated can be garbage collected.   (and even then you depend on
the quirks of the platform specific malloc implementation to do it).

>From the sound of the description above, it sounds like the data in
Vchar_unify_table is allocated while reading the charset data, and it is
not released after the charset data is.  So the allocator cannot release
all the pages... 
[note: this speculation based solely on your description above]

  > (3) When the dumped Emacs runs, at the time of
  >     decoding/encoding charsets that are unified as above, by
  >     checking if the value of Vchar_unify_table for a
  >     character is symbol or not, Emacs knows whether it has
  >     to load the mapping table again or not.
  > 
  >     So, that way, Emacs loads maps on demand.

So it sounds that your goal is to build Vchar_unify_table, and it is
build from static data in emacs/etc/charsets/*.  In that case, can't the
data in Vchar_unify_table be a C data structure that is build offline,
and just compiled into emacs?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]