bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speeding up GNU make for LibreOffice by factor ~2 (and dependency fi


From: Bjoern Michaelsen
Subject: Re: speeding up GNU make for LibreOffice by factor ~2 (and dependency file parsing by factor ~10)
Date: Thu, 20 Feb 2014 03:53:01 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi Paul,

On Wed, Feb 19, 2014 at 09:20:06AM -0500, Paul Smith wrote:
> Also it's probably worthwhile to hold this discussion on the
> address@hidden mailing list; most of the people interested in GNU make
> development hang out there.
Ok, here we go. ;)

> On Wed, 2014-02-19 at 00:11 +0100, Bjoern Michaelsen wrote:
> > So I tried if putting an cachefile with an index of filenames at the
> > beginning and then just referencing those filenames helps.
> 
> Hi Bjoern; thanks for your work on improving GNU make!  It would be
> helpful to me if you could send along a quick description of exactly
> what this is doing that gives the speedup; I could read the code but it
> will be faster to just read an explanation.  It's not exactly clear from
> the above.

It introduces a new "includedepcache" keyword. The first time GNU make runs
past it, it works just like a normal include, except that it tracks the
dependency relations described in the included file and writes them in a
simplified and better to parse format into the file ${includefile}.cache.

The second time GNU make comes past the includedepcache statement, it checks
for the ${includefile}.cache file, and if it is younger than ${includefile},
and if so, it reads that file instead of the ${includefile}.

The format of the cache file is:
<Number of filename>
filename1
filename2
...
<Number of dependency relations>
<index of target1><index of dependency1><index of target2><index of 
dependency2>...

with the indices written in binary. A usual LibreOffice build generates 1.3GB
of dependency files for >8200 object files. As we dont want to open 8200 files
on each make run we concat these files to one per library (and do some
deduplication), which us brings down to ~300MB[1] in standard make syntax.
However, parsing that still takes a lot of time, and more than needs to be, as 
the
dependency file for one library has 800 objects, you have:

long_path_to_object1: long_path_to_header_with_string_types <more dependencies>
long_path_to_object2: long_path_to_header_with_string_types <more dependencies>
...
long_path_to_object800: long_path_to_header_with_string_types <more 
dependencies>

which is a lot of duplication. Instead of parsing
"long_path_to_header_with_string_types" 800 times and look for it 800 times in
the strcache, when parsing the cachefile this is done once at the beginning.

For LibreOffice, parsing the cachefile is ~10 times faster than the standard
make syntax, and parsing dependencies is then reduced to a neglectable part of
a noop incremental build. On my machine using the depcache it takes:

- 5.8 seconds to parse the ~134KLOC build description, which is very heavy on
              $(eval $(call))
- 0.6 seconds to parse the cachefiles for >8200 targets
- 1.3 seconds to stat all the targets

for a total of 7.7 seconds, while with include instead of includedepcache
parsing the whole 300MB of generated dependencies instead of the cachefiles
yields:

- 5.8 seconds to parse the ~134KLOC build description, which is very heavy on
              $(eval $(call))
- 5.9 seconds to parse the generated dependencies in standard make syntax
- 1.3 seconds to stat all the targets

for a total of 13.0 seconds.

The current implementation for this can be found here:

 
https://gerrit.libreoffice.org/gitweb?p=gnu-make-lo.git;a=shortlog;h=refs/heads/feature/depcache

it has tests, but no further documentation (apart from this mail) yet.

Best,

Bjoern


[1] see
    
https://gerrit.libreoffice.org/gitweb?p=core.git;a=blob;f=solenv/bin/concat-deps.c;h=a64723f476d77f88c147545dc8844ac47c44dfb2;hb=22b709e84a7b6d38cab2dd37f2f2b28e0fc9d062
    if you really want to know all the gory details. I doubt that. ;)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]