[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gcl-devel] GCL memory allocation and GC problems
From: |
Vadim V. Zhytnikov |
Subject: |
[Gcl-devel] GCL memory allocation and GC problems |
Date: |
Mon, 24 Nov 2003 23:41:56 +0300 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru-RU; rv:1.5) Gecko/20031006 |
Hi!
I've been trying to pin down the reason of GCL
performance problems. The trouble is clearly related to
memory allocation strategy and garbage collection.
Roughly situation stands as follows:
often computation time for some problem with GCL
(I use Maxima as test program) is longer than for other CL
implementations (Clisp, CMUCL). And the bigger the problem
the worse is GCL's performance. More thorough investigation
shows that in such situation GCL spends 90-95% of computation
time by doing garbage collections. Net time for
useful computation = (total time) - (GC time) is quite impressive.
It compares to one of CMUCL (or even better). But 95% of
GC time spoils whole picture. The situation is far from being
satisfactory. So my objective is to propose some improvements
or to locate a bottleneck.
To proceed further I have to say a few words about
GCL memory layout. Forgive me if I'm too verbose.
GCL's memory is divided into three non overlapping
parts - heap area, relocatable blocks area and a hole
between them. Hole is just a free space. It shrinks
when heap or relocatable blocks grow (toward each other).
Memory is allocated by pages.
Each GCL object belongs to one of implementation types.
Implementation types roughly correspond to CL types but
not exactly. For example compiled functions map to several
implementation types. On the other hand implementation type
SPICE is used for internal purpose and has no corresponding
lisp type.
Implementation types are classified according to size of the object.
Take a look at the output of the function (room). Each of the
first eight lines correspond to one implementation type class.
103/211 69.1% 1 CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
1/28 12.7% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE READTABLE
49/49 73.3% SYMBOL STREAM
1/2 14.1% PACKAGE
1/38 47.9% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
CCLOSURE FAT-STRING
22/32 85.0% STRING
3/27 95.6% CFUN BIGNUM
6/6 86.7% SFUN GFUN CFDATA SPICE NIL
10/194 contiguous (21 blocks)
118 hole
50 9.4% relocatable
186 pages for cells
364 total pages
125151 pages available
5557 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
Numbers on the left
AAA/BBB CC% D
are
AAA - number of pages currently allocated for this class
BBB - maximal number of pages for this class
CC% - percent of space really occupied by cells
D - number of GC occurred for this class, empty
if zero.
All objects of one implementation type class share
pages and their allocation and GC is managed together.
Usually objects are stored in heap. Some of them (e.g. CONS
- list nodes) resides entirely in heap. For other objects
(functions, arrays, symbols, etc) heap contains only header.
The body of the object is stored in blocks. There are two
kinds of blocks. Contiguous blocks are stored in heap.
Relocatable blocks are stored in relocatable area.
Contiguous blocks as other objects in heap are newer
moved. Relocatable blocks can be compactified during GC.
When space for cell of some class is exhausted GCL starts
garbage collection. If after GC percent of free cells
is lower than some thresold then GCL allocates some extra pages.
There is a function which controls this process
(si::allocate-growth
<implementation-type>
min_grow
max_grow
percent_grow
percent_free )
If after GC for cells of <implementation-type> percent of free
cells is less than percent_free then total number of pages
is enlarged on percent_grow percents but not less than min_grow
and no more than max_grow.
I'm sorry for such lengthly introduction but I need this
for reference. We come to the point after all.
The default values for these 4 parameters are
1 1000 50 10
This numbers partially explains GCL performance problems.
I think the they are not adequate to modern needs.
First, allocation threshold - 10% is too low. In general if lisp
works with low percent of free cells the frequency
of GC rapidly grows. Second, the maximal number
of pages which can by allocated 1000 seems to be
very small. Such low values for these parameters
makes GCL quite lazy at allocating extra memory.
I think that something like
1 10000 50 30
or even
1 10000 100 50
is more up to date.
Now I'd like to show result of some simple
tests which demonstrates how this machinery
works in practice and also reveals some other
problem.
I'm going to compute ratsimp((x+y+z)^500)$ and see what happens.
On Athlon XP+ 2400 with 512Mb the result for
cmucl 18e - 4 sec
clisp 2.31 - 16 sec
gcl 2.6.1 traditional - 23 sec
gcl 2.6.1 ansi - 27 sec
I use current gcl 2.6.1 cvs and Maxima cvs of September 9 2003
(current Maxima cvs will not build on traditional gcl).
The maxima-init.lisp is
==============================================
#+gcl
(progn
(si::allocate-relocatable-pages 2000 t)
(si::allocate 'cons 2000 t)
(si::allocate 'fixnum 200 t)
#-gmp (si::allocate 'cfun 200 t)
#+gmp (si::allocate 'cfun 1000 t)
(si::allocate 'symbol 100 t)
#+gmp (si::set-gmp-allocate-relocatable t)
;;(setq si::*notify-gbc* t)
)
===============================================
Now let's consider GCL timing more thoroughly.
If I feed
showtime:true;
:lisp (setq si::*notify-gbc* t)
:lisp (si::gbc-time 0)
:lisp (room)
ratsimp((x+y+z)^500)$
:lisp (si::gbc-time)
:lisp (room)
into Maxima I get the following output
=============================================================
GCL (GNU Common Lisp) (2.6.1) Thu Nov 20 20:56:28 MSK 2003
Licensed under GNU Library General Public License
Dedicated to the memory of W. Schelter
Use (help) to get some basic information on how to use GCL.
Maxima 5.9.0.1cvs http://maxima.sourceforge.net
Distributed under the GNU Public License. See the file COPYING.
Dedicated to the memory of William Schelter.
This is a development version of Maxima. The function bug_report()
provides bug reporting information.
(C1)
Evaluation took 0.00 seconds (0.00 elapsed)
(D1) TRUE
(C2)
T
(C2)
0
(C2) 2000/2000 28.2%CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
200/200 0.4% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
READTABLE NIL
135/135 98.1% SYMBOL STREAM
1/2 19.2% PACKAGE
11/38 46.6% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
CCLOSURE FAT-STRING
68/72 96.4% STRING
1000/1000 1.1% CFUN BIGNUM
26/28 97.3% SFUN GFUN CFDATA SPICE NIL
4/336 contiguous (4 blocks)
128 hole
2000 0.3% 3 relocatable
3441 pages for cells
5573 total pages
116770 pages available
8729 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
(C2)
[GC for 2000 CONS pages..(T=6).GC finished]
[GC for 2000 CONS pages..(T=8).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=10).GC finished]
[GC for 2000 CONS pages..(T=10).GC finished]
make_cons calls grow_linear ...
grow_linear old=2000 delt=1000 new=3000
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=13).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=14).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=14).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=14).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=16).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=16).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=17).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=18).GC finished]
[GC for 2000 RELOCATABLE-BLOCKS pages..(T=19).GC finished]
alloc_relblock calls grow_linear ...
grow_linear old=2000 delt=1000 new=3000
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=20).GC finished]
[GC for 3000 CONS pages..(T=14).GC finished]
make_cons calls grow_linear ...
grow_linear old=3000 delt=1000 new=4000
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=20).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=20).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=22).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=24).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
[GC for 3000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
alloc_relblock calls grow_linear ...
grow_linear old=3000 delt=1000 new=4000
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
[GC for 4000 CONS pages..(T=19).GC finished]
make_cons calls grow_linear ...
grow_linear old=4000 delt=1000 new=5000
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
[GC for 1000 CFUN pages..(T=20).GC finished]
alloc_objects allocs pages ...
grow_linear old=1000 delt=500 new=1500
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=18).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=23).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 5000 CONS pages..(T=20).GC finished]
make_cons calls grow_linear ...
grow_linear old=5000 delt=1000 new=6000
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=26).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=27).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
[GC for 6000 CONS pages..(T=23).GC finished]
make_cons calls grow_linear ...
grow_linear old=6000 delt=1000 new=7000
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=28).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=29).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=30).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=25).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=29).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=31).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=34).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=35).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=36).GC finished]
[GC for 7000 CONS pages..(T=30).GC finished]
make_cons calls grow_linear ...
grow_linear old=7000 delt=1000 new=8000
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=37).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=38).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=38).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=39).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=38).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=39).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=40).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=41).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=42).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=42).GC finished]
[GC for 8000 CONS pages..(T=35).GC finished]
make_cons calls grow_linear ...
grow_linear old=8000 delt=1000 new=9000
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=43).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=41).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=43).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=44).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=44).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=44).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=46).GC finished]
Evaluation took 24.38 seconds (24.50 elapsed)
(C3)
2202
(C3) 8751/9000 99.0% 9CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
200/200 0.4% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
READTABLE NIL
135/135 98.1% SYMBOL STREAM
1/2 19.2% PACKAGE
11/38 45.1% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
CCLOSURE FAT-STRING
68/72 81.4% STRING
1000/1500 50.2% 1 CFUN BIGNUM
26/28 97.3% SFUN GFUN CFDATA SPICE NIL
4/336 contiguous (5 blocks)
128 hole
4000 58.3% 74 relocatable
10192 pages for cells
14324 total pages
106019 pages available
10729 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
(C3)
===========================================================
First of all take a look at 2202 right after first (C3).
This is total time of GC in 1/100 sec. So 22.02 seconds
of total 24.38 are spent for garbage collections - 90%.
We have 9 GC for CONS cells, 1 for CFUN and 71 for
relocatable blocks.
Let's try to improve situation by adding
:lisp (si::allocate-growth 'cons 1 10000 100 50)
to the input. New result is (I show bottom lines only)
===========================================================
.....
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
[GC for 4000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
Evaluation took 23.00 seconds (23.13 elapsed)
(C3)
2064
(C3) 8751/16000 99.0% 3CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
200/200 0.4% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
READTABLE NIL
135/135 98.1% SYMBOL STREAM
1/2 19.2% PACKAGE
11/38 45.1% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
CCLOSURE FAT-STRING
68/72 81.4% STRING
1000/1500 50.2% 1 CFUN BIGNUM
26/28 97.3% SFUN GFUN CFDATA SPICE NIL
4/336 contiguous (5 blocks)
128 hole
4000 58.3% 73 relocatable
10192 pages for cells
14324 total pages
106019 pages available
10729 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
(C3)
=============================================================
Now we have only 3 garbage collections for CONS cells.
Very good since GC time for CONS is now 3 times shorter.
Let's try to do something like this with relocatable pages
:lisp (si::allocate-growth 'relocatable 1 10000 200 80)
The result is really strange:
==============================================================
[GC for 6000 RELOCATABLE-BLOCKS pages..(T=45).GC finished]
[GC for 6000 RELOCATABLE-BLOCKS pages..(T=46).GC finished]
Evaluation took 22.43 seconds (22.59 elapsed)
(C3)
2020
(C3) 8745/16000100.0% 3CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
200/200 0.4% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
READTABLE NIL
135/135 98.1% SYMBOL STREAM
1/2 19.2% PACKAGE
11/38 45.1% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
CCLOSURE FAT-STRING
68/72 81.4% STRING
1000/1000 50.2% CFUN BIGNUM
26/28 97.3% SFUN GFUN CFDATA SPICE NIL
4/336 contiguous (5 blocks)
77 hole
6000 38.9% 72 relocatable
10186 pages for cells
16267 total pages
102025 pages available
12780 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
(C3)
==============================================================
In spite of the fact that more aggressive memory allocation
for relocatable blocks works (compare 6000 final pages with
4000 in previous test) the number and total time of GC for
relocatable blocks remains virtually unchanged!
Let's do another trick. We allocate a large space
(30000 pages) for relocatable blocks at the beginning.
I expect that now number of GC for relocatable blocks
should be much smaller. Nothing of the kind!
==============================================================
[GC for 30000 RELOCATABLE-BLOCKS pages..(T=46).GC finished]
Evaluation took 22.95 seconds (23.05 elapsed)
(C3)
2066
(C3) 8762/16000 98.7% 3CONS RATIO LONG-FLOAT COMPLEX STRUCTURE
200/200 0.4% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE
READTABLE NIL
135/135 98.1% SYMBOL STREAM
1/2 19.2% PACKAGE
11/38 45.1% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME
CCLOSURE FAT-STRING
68/72 81.4% STRING
1000/1500 50.2% 1 CFUN BIGNUM
26/28 97.3% SFUN GFUN CFDATA SPICE NIL
4/336 contiguous (5 blocks)
128 hole
30000 7.8% 72 relocatable
10203 pages for cells
40335 total pages
54008 pages available
36729 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
(C3)
=============================================================
Once again number and time of GC for relocatable blocks remains
practically unchanged.
And this is vary alarming. GCL spends 90% of time
doing GC for relocatable blocks and we seems to be unable
to improve the situation. At the moment I simply don't
understand how GC rate can be the same for 2000 and
30000 initial pages. Probably I miss something important
about GC for relocatable blocks. But in any case
I think that situation is far from being perfect.
Any comments and ideas are greatly appreciated.
PS: All transcripts of my tests are attached.
--
Vadim V. Zhytnikov
<address@hidden>
<address@hidden>
test.tar.bz2
Description: BZip2 compressed data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Gcl-devel] GCL memory allocation and GC problems,
Vadim V. Zhytnikov <=