[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Benchmarks on server class machines and resulting change req
From: |
Ingo Macherius |
Subject: |
[Bug-gnubg] Benchmarks on server class machines and resulting change requests |
Date: |
Sun, 2 Aug 2009 17:06:54 +0200 |
I have benchmarked gnubg on two server machines, with particular focus on
multithreading. Both Machines are headless and run Debian 5.x Lenny, Kernel
2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. The hardware is:
box_A: 2xXeon 5130 @ 2GHz (4 physical cores in 2 chips)
box_B: 2xXeon Nocona @ 3GHz (2 physical cores plus 2 HT "cores" in 2 chips)
I found two issues with current gnubg (latest CVS version as of August 1st
2009, compiled with gcc 4.3.2.1 with -march=native and sse2 support):
1) The "calibrate" command output is off by a factor of 1000, i.e. reports
eval/s values 1000 times too high. This holds for the figure reported in the
official Debian binary installed via apt-get.
2) The limit of 16 threads is too low, I found that to utilize the CPU power to
100% 8 threads per core are needed. Interestingly this holds for the virtual HT
cores as well.
@1: Please check the timer code, the problem seems to be in timer.c. Obviously
the #ifdef part for Windows is fine, but all other machines use a faulty
version of the timer. I can't really suggest a solution, but here is some
background info from wikipedia: http://en.wikipedia.org/wiki/Rdtsc
I would help to fix this one by testing on the beforementioned machines under
64 bit Linux.
@2: I've tested with a custom gnubg binary with the bug at @1 fixed the hard
way by dividing by 1000 hardcodedly and thread limit raised to 256. While
calibrate was running I've monitored CPU utilization usiing the mpstat command.
box_A peaks at about 202K eval/s with 8 threads per core (32 total), from where
on the number is static until it starts decreasing again when you use hundreds
of threads. between 1 and 3 threads I see the expected gain of almost 100% per
thread added. Using 4 threads is lowering the throughput as compared to 3
threads. Between 5 and 32 threads I see rising throughput which first is
linear, and becomes asymptotic as we get closer to 32 threads. Below 32
threads, mpstat reports significant idle times for each CPU, at 32 I see each
is using 100% of the cycles.
A very similar behavior is visible on box_B, despite the fact 2 of its "cores"
are virtual HT cores.
Extrapolating the results suggests gnubg should increase the limit for the
number of max. threads to 64, maybe even 128 or 256. Rationale: recent server
hardware with dual quadcores has 8 cores, which should be fully utilizeable
only with 64 threads. The suggested 128 anticipates future improvements. As
there seems to be little to no cost with higher values for max. threads, this
seems like a cheap way to speed up gnubg on server class machines and quad
cores at little to no cost.
Cheers,
Ingo
- [Bug-gnubg] Benchmarks on server class machines and resulting change requests,
Ingo Macherius <=