|
From: | Jonathan Kinsey |
Subject: | Re: [Bug-gnubg] Benchmarks on server class machines and resulting change requests |
Date: | Tue, 4 Aug 2009 07:40:53 +0000 |
It's not clear if you were using the hyper-threaded machine as this might explain the jump form 1 to 2 cores and the smaller jump to 3 and 4 cores. If you were using machine "B", try running the test again for 1,2,3,4 threads on machine "A". Make sure the cache size is set to maximum. Jon Ingo Macherius wrote: > Christian, I've conducted your suggested experiment (batch eval of saved matches) and can confirm your answer. Calibrate ist not a suitable metric to evaluate threading behaviour for gnubg. > > The batch experiment did analyze five 7pt matches for 4 times each, with full cache cleaning. The time was taken with unix "time" command. The results are much more like what one would expect: > - Speed peaked wheen the number of threads equaled the number of cores > - Adding more threads than cores slowed down the evaluation (albeit, by only a tiny nit) > - Speed decrease increased in the number of threads > > The odd finding is that there still are some anonalies, which are: > - Going from 1 to 2 threads more than doubles the evaluation > - It has very little effect adding more threads, i.e. the gain is not linear in # cores > - 2, 3 and 4 threads result in speeds very close to each other, much closer than expected > > I've attached a ZIP which contains the original OpenOffice 3.1 spreadsheet and a PDF version of the graphs with the experiment details. > > Thx a lot for your guidance! > > Ingo > >> -----Original Message----- >> From: Christian Anthon [mailto:address@hidden >> Sent: Monday, August 03, 2009 12:29 PM >> To: Ingo Macherius >> Cc: address@hidden >> Subject: Re: [Bug-gnubg] Benchmarks on server class machines >> and resulting change requests >> >> >> The calibrate function sucks bit time. The threaded calibrate >> function sucks even more. I'm tempted to call it useless. I >> believe that you are observing the following: There is some >> overhead involved in displaying and updating the calibration, >> and as you are increasing the number of threads more and more >> time is allocated to evaluation and less and less to >> overhead. If you really want to test the speed of the >> threading then you should analyse a match or perform a rollout. >> >> The original calibration was meant to calibrate certain >> timing functions against the speed of your computer, so >> overhead didn't really matter. That is the function measures >> the speed of your computer, not the speed of gnubg. >> >> Christian. >> >> On Sun, Aug 2, 2009 at 5:06 PM, Ingo >> Macherius wrote: >>> I have benchmarked gnubg on two server machines, with >> particular focus >>> on multithreading. Both Machines are headless and run Debian 5.x >>> Lenny, Kernel 2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. The >> hardware is: >>> box_A: 2xXeon 5130 @ 2GHz (4 physical cores in 2 chips) >>> box_B: 2xXeon Nocona @ 3GHz (2 physical cores plus 2 HT >> "cores" in 2 >>> chips) >>> >>> I found two issues with current gnubg (latest CVS version >> as of August >>> 1st 2009, compiled with gcc 4.3.2.1 with -march=native and sse2 >>> support): >>> >>> 1) The "calibrate" command output is off by a factor of 1000, i.e. >>> reports eval/s values 1000 times too high. This holds for >> the figure >>> reported in the official Debian binary installed via apt-get. >>> >>> 2) The limit of 16 threads is too low, I found that to >> utilize the CPU >>> power to 100% 8 threads per core are needed. Interestingly >> this holds >>> for the virtual HT cores as well. >>> >>> @1: Please check the timer code, the problem seems to be in >> timer.c. >>> Obviously the #ifdef part for Windows is fine, but all >> other machines use a faulty version of the timer. I can't >> really suggest a solution, but here is some background info >> from wikipedia: http://en.wikipedia.org/wiki/Rdtsc >>> I would help to fix this one by testing on the >> beforementioned machines under 64 bit Linux. >>> @2: I've tested with a custom gnubg binary with the bug at @1 fixed >>> the hard way by dividing by 1000 hardcodedly and thread >> limit raised >>> to 256. While calibrate was running I've monitored CPU utilization >>> usiing the mpstat command. >>> >>> box_A peaks at about 202K eval/s with 8 threads per core >> (32 total), >>> from where on the number is static until it starts decreasing again >>> when you use hundreds of threads. between 1 and 3 threads I see the >>> expected gain of almost 100% per thread added. Using 4 threads is >>> lowering the throughput as compared to 3 threads. Between 5 and 32 >>> threads I see rising throughput which first is linear, and becomes >>> asymptotic as we get closer to 32 threads. Below 32 threads, mpstat >>> reports significant idle times for each CPU, at 32 I see >> each is using >>> 100% of the cycles. >>> >>> A very similar behavior is visible on box_B, despite the >> fact 2 of its >>> "cores" are virtual HT cores. >>> >>> Extrapolating the results suggests gnubg should increase >> the limit for >>> the number of max. threads to 64, maybe even 128 or 256. Rationale: >>> recent server hardware with dual quadcores has 8 cores, >> which should >>> be fully utilizeable only with 64 threads. The suggested 128 >>> anticipates future improvements. As there seems to be little to no >>> cost with higher values for max. threads, this seems like a >> cheap way >>> to speed up gnubg on server class machines and quad cores >> at little to >>> no cost. >>> >>> Cheers, >>> Ingo >>> >>> >>> >>> _______________________________________________ >>> Bug-gnubg mailing list >>> address@hidden http://lists.gnu.org/mailman/listinfo/bug-gnubg >>> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Bug-gnubg mailing list >> address@hidden >> http://lists.gnu.org/mailman/listinfo/bug-gnubg Celebrate a decade of Messenger with free winks, emoticons, display pics, and more. Get Them Now |
[Prev in Thread] | Current Thread | [Next in Thread] |