I'm working on completing the migration of our build process for a
rather large software project from Solaris SPARC to Linux x86 and have
run into an issue. This process is using GNU Make 3.81 on a Solaris 9
box and a RedHat AS 4.7 x86_64 system. The major symptom that I've
noticed is that the Linux system doesn't really honor the "-j 4"
option we typically build with. It quickly degrades into a single
threaded build. Items of note include:
- The builds are being passed to a cluster of systems running Sun Grid
Engine, so the "-j 4" option isn't passed at the command line. The
first build command looks for a $NSLOTS environment variable and
changes the MAKEFLAGS as appropriate.
- I am running both builds with a copy of GNU Make that I compiled
from the same source code. I am not using the copy of make that was
included with the RedHat or Solaris systems.
- The makefiles include settings like "override SHELL = /bin/ksh" to
force all shell interpretations to go through the ksh.
It appears as though the Linux system feeds every command through a
ksh process, while the same function on the Solaris system calls the
command (wether it is a ccpentium, ccarm, or make command) directly.
This is done by looking at the process hierarchy using the pstree
command. The examples below were both done on a build with NSLOTS=4
(ie: -j 4). You can see the Solaris build running three ccpentium
processes at the time this snapshot was taken, while the Linux build
has only spawned a single ccpentium command.
Solaris:
====================
ictgrid004:~> sgetree
-+- 00278 sgeadmin 9:36 /soft/gridware-wic/sge/6.0u6/bin/sol-sparc64/
sge_execd
|-+- 18341 sgeadmin sge_shepherd-467543 -bg
| \-+- 18342 root /soft/gridware-wic/sge/6.0u6/utilbin/sol-sparc64/
rshd -l
| \-+- 18343 swaltner /soft/gridware-wic/sge/6.0u6/utilbin/sol-
sparc64/qrsh_
| \-+- 18347 swaltner tcsh -c hostname ; gmake
| \-+- 18353 swaltner /soft/gnu/make/3.81/bin/gmake
| \-+- 26740 swaltner /soft/gnu/make/3.81/bin/gmake
Platform/.make App
| |-+- 26757 swaltner /soft/gnu/make/3.81/bin/gmake -C
Platform MKLe
| | \-+- 26819 swaltner /soft/gnu/make/3.81/bin/gmake
Boot/.make Sys
| | \-+- 26906 swaltner /soft/gnu/make/3.81/bin/gmake -C
System MK
| | \-+- 27024 swaltner /soft/gnu/make/3.81/bin/gmake
BSP/.make
| | \-+- 10502 swaltner /soft/gnu/make/3.81/bin/
gmake -C DQ MK
| | \-+- 10560 swaltner /soft/gnu/make/3.81/bin/
gmake DQ MKL
| | \-+- 11323 swaltner ccpentium -c -o dq.o -
fmessage-len
| | \--- 11331 swaltner /soft/windriver/gpp/
3.4/gnu/3.4.
| \-+- 26788 swaltner /soft/gnu/make/3.81/bin/gmake -C
Application M
| \-+- 26853 swaltner /soft/gnu/make/3.81/bin/gmake
RAID/.make Deb
| |-+- 06868 swaltner /soft/gnu/make/3.81/bin/gmake -C
Debug MKL
| | \-+- 06928 swaltner /soft/gnu/make/3.81/bin/gmake
ccvm_dbg/.
| | \-+- 11524 swaltner /soft/gnu/make/3.81/bin/
gmake -C safe_
| | \-+- 11585 swaltner /soft/gnu/make/3.81/bin/
gmake safe_d
| | \-+- 11639 swaltner ccpentium -c -o
safeSymbolDebug.o
| | \--- 11642 swaltner /soft/windriver/gpp/
3.4/gnu/3.4.
| |-+- 26909 swaltner /soft/gnu/make/3.81/bin/gmake -C
RAID MKLe
| | \-+- 27055 swaltner /soft/gnu/make/3.81/bin/gmake
cache/.mak
| | \-+- 08612 swaltner /soft/gnu/make/3.81/bin/
gmake -C hid M
| | \-+- 08728 swaltner /soft/gnu/make/3.81/bin/
gmake hid MK
| | \-+- 11452 swaltner ccpentium -c -o
hidLUDispatch.o -f
| | \--- 11457 swaltner /soft/windriver/gpp/
3.4/gnu/3.4.
| \--- 11635 swaltner /soft/gnu/make/3.81/bin/gmake -C
MAPI MKLe
====================
Linux:
====================
ictgrid005:~/ccm_wa/symbios/RAIDCore-swaltner_1636/
dev_09q4_fc_7091-68.10.00.03> ~/pstree-2.32/pstree 3543
-+= 03543 root /soft/gridware-wic/sge/6.0u6/bin/lx24-amd64/sge_execd
\-+= 21589 root sge_shepherd-467474 -bg
\-+= 21590 root /soft/gridware-wic/sge/6.0u6/utilbin/lx24-amd64/
rshd -l
\-+= 21591 swaltner /soft/gridware-wic/sge/6.0u6/utilbin/lx24-
amd64/qrsh_starter /var/spool/sgeexecd/ictgrid005/active_jobs/467474.
\-+= 21603 swaltner tcsh -c hostname ; gmake
\-+- 21612 swaltner gmake
\-+- 04707 swaltner /bin/ksh -c gmake Platform/.make
Application/.make MKLevel=$(( 0 + 1 )) MKopts='';
\-+- 04708 swaltner gmake Platform/.make
Application/.make MKLevel=1 MKopts=
\-+- 04787 swaltner /bin/ksh -c gmake -C
Application MKLevel=$(( 1 + 1 ))
\-+- 04788 swaltner gmake -C Application MKLevel=2
\-+- 04868 swaltner /bin/ksh -c gmake RAID/.make
Debug/.make MAPI/.make TAPI/.make Spy/.make Stpsim/.make FBDT/.make
\-+- 04870 swaltner gmake RAID/.make Debug/.make
MAPI/.make TAPI/.make Spy/.make Stpsim/.make FBDT/.make IT/.make D
\-+- 04947 swaltner /bin/ksh -c gmake -C
RAID MKLevel=$(( 3 + 1 ))
\-+- 04948 swaltner gmake -C RAID MKLevel=4
\-+- 05074 swaltner /bin/ksh -c gmake
cache/.make iop/.make htd/.make hid/.make icn/.make rtr/.make
rpa/.make
\-+- 05075 swaltner gmake cache/.make
iop/.make htd/.make hid/.make icn/.make rtr/.make rpa/.make Fibre/.ma
\-+- 11193 swaltner /bin/ksh -c gmake
-C vdm MKLevel=$(( 5 + 1 ))
\-+- 11194 swaltner gmake -C vdm
MKLevel=6
\-+- 18797 swaltner /bin/ksh -c
gmake vdm MKLevel=$(( 6 + 1 )) MKopts='';
\-+- 18798 swaltner gmake vdm
MKLevel=7 MKopts=
\-+- 22893 swaltner /bin/ksh -
c HOME="" LM_LICENSE_FILE="" ccpentium -c -o vdmRVState.o -fmessage
\-+- 22894 swaltner
ccpentium -c -o vdmRVState.o -fmessage-length=0 -O2 -nostdlib -fno-
builtin
|--- 22896 swaltner /soft/
windriver/gpp/3.4/gnu/3.4.4-vxworks-6.4/x86-linux2/bin/../libexec/g
\--- 22895 root
(get_feature)
ictgrid005:~/ccm_wa/symbios/RAIDCore-swaltner_1636/
dev_09q4_fc_7091-68.10.00.03>
====================
I believe this behavior is causing the make process to consume tokens
for the parallel builds when it shouldn't be. The ksh process that
launches the gmake command in the subdirectory is consuming the token.
Once you get deep enough in the source directory, all the tokens are
in use by these idle ksh processes causing it to fall-back to a single
thread on the build. This is confirmed by starting a build using a "-j
8" or "-j 16" or higher. By giving the make process more tokens, it is
able to keep the CPU busy on this quad CPU Linux server. This worked
fine when there is a single developer on the build system, but that
won't work well for the way we launch builds on these systems through
SGE. Once this issue is resolved, we can deploy the x86 hardware which
will give us the same build speeds in a box that is 20% the physical
size and costs about 10% of the price of the SPARC systems we have
been using.
Thanks for any guidance you can provide. I've been fooling with this
for several days without any luck.
Steve
_______________________________________________
Help-make mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/help-make