libtool-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

speed up large library linking


From: Ralf Wildenhues
Subject: speed up large library linking
Date: Mon, 9 May 2005 22:58:01 +0200
User-agent: Mutt/1.5.9i

First off: My laptop was broke last week, then some of our department's
hardware was destroyed, so: no mail reading, thus no patch checks, no
1.5.18 release.  Also I'll most likely have little to no net connection
for a yet unspecified time to come, and will surely miss some mails.
OTOH, that meant time for libjava over the weekend, so here we go, in
reverse logical order:

Results:
--------

Improvements so far for link mode only:
(timings all done on a fast linux dual machine)

linking libgcj0_convenience (roughly 2450 objects):

  old GCC libtool:
68.79user 42.56system 1:50.63elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6159792minor)pagefaults 0swaps

  old GCC libtool with -objectlist:
50.99user 38.50system 1:27.78elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5453057minor)pagefaults 0swaps

  HEAD after optimizations:
11.24user 0.98system 0:12.71elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+80210minor)pagefaults 0swaps

  HEAD after optimizations, with -objectlist:
3.99user 3.51system 0:08.55elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+273791minor)pagefaults 0swaps

  same, but dry run (i.e. the libtool overhead):
1.86user 0.76system 0:02.80elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+71211minor)pagefaults 0swaps

libtool overhead: 33 %
libtool overhead improvement: 97 %


linking libgcj.la (composed of some convenience archives, e.g. above):

  old GCC libtool:
57.12user 24.04system 1:21.12elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k

  HEAD after optimizations:
10.86user 5.13system 0:18.16elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+371214minor)pagefaults 0swaps

  same, but dry run:
0.78user 1.11system 0:02.66elapsed 71%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+71053minor)pagefaults 0swaps

libtool overhead: 15 %
libtool overhead improvement: 96 %


linking libgcj.la with reloading forced
(disabled whole_archive, disabled GNU ld script):

  old GCC libtool:
41.09user 8.82system 0:50.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1205606minor)pagefaults 0swaps

  old HEAD with other optimizations above:
33.50user 12.83system 0:50.47elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+1340357minor)pagefaults 0swaps

  same, but dry run:
22.93user 7.34system 0:30.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+949194minor)pagefaults 0swaps

  HEAD, now with reload optimization:
8.68user 4.20system 0:18.17elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+276950minor)pagefaults 0swaps

  same, but dry run:
1.33user 1.82system 0:03.08elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+110107minor)pagefaults 0swaps

libtool overhead: 17 %
libtool overhead improvement: 91 %


Discussion:
-----------

While the overhead improvements look nice, all there is to them is a
complexity reduction.  For links with a small number of objects, there
will hardly be any improvement, but possibly a small (but constant)
degradation.  Also we are far from Bob's 1% demand, but oh well.  :-)

Immediate consequence for the libjava folks: Use of -objectlist is to be
preferred, with their (ancient!) libtool as well as with HEAD after the
changes below.  I have another optimization idea for -objectlist which
will kill some of the 2.8s left, but it needs more work, and might not
be immediately necessary.


Changes:
--------

My patches break one assumption held in libtool so far:  that --dry-run
will cause no file changes.  I broke it because dry run will be of no
value if you just skip all your arguments then.  In order to make up for
the breakage, I chose to create a new temporary directory and store all
files in there.  I'm open to suggestions whether it should remain under
${TMPDIR-/tmp} as it is now, or be stored somewhere below .libs / _libs.
Problem with the latter is that, when we create the directory, we do not
know the output directory just yet.  Also I'm unsure whether it is ok to
just remove the temp dir with a trap on signal 0 -- we might leave some
unwanted leftovers here(?).

My patches will require the build system to have working (SUSv3 conforming)
  join, fold, paste, and split
utilities.  Are any problems with these known (and not mentioned in
autoconf.texi)?  Does MinGW provide them?  From a cursory glance I could
not find join and paste -- we might have to keep the old, slow algorithm
for renaming as special case or think of a fast one without them (or
convince the MinGW people to include these tools. :-)

It will also require that `tr' works on non-text files, i.e. files with
long lines.  I believe we have relied on this before, and do not know of
any problems here, but I think this is not covered by POSIX.

For the time being, I require $ECHO to be builtin.  This has been
implicitly assumed in several places already, but becomes visible only
when command line length is exceeded.  I'm working on a fix, but as of
now I have only a patch to mark all occurences I could find.

My changes will require you to either not use `\' in path and file names
or have a shell that understands `read -r'.  IOW, if you try to cross
compile from Solaris for Cygwin, force use of bash instead of ksh.
Also, newlines in file names are forbidden (but that is nothing new).


Patches:
--------

- Factor out detection of `read -r' support and POSIX or pre-POSIX `sort'.
- Add FIXMEs to all places which implicitly assume builtin $ECHO.

complexity reductions:
- rewrite argument parsing to use temp files for long argument lists.
- rewrite partial linking
- rewrite duplicate object renaming 
- rewrite piecewise old archive linking; adjust pdemo test

All of this has only been tested on a couple of systems (with pdemo), so
many bugs ought to be left in there, and feedback is very much welcome.
We'll probably also find some oddities in system's file utils.  IOW:
These patches most likely ought to carry "break frobnozzle" instead of
"fix frobnozzle" as log entries.  :-)

OK to apply them all to HEAD?

Regards,
Ralf

Attachment: speedup-features2.diff
Description: Text document

Attachment: speedup-fixme2.diff
Description: Text document

Attachment: speedup-parseargs2.diff
Description: Text document

Attachment: speedup-reload2.diff
Description: Text document

Attachment: speedup-rename2.diff
Description: Text document

Attachment: speedup-piecewise-oldlibs2.diff
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]