|
From: | Elias Mårtenson |
Subject: | Re: [Bug-apl] 80 core performance results |
Date: | Fri, 22 Aug 2014 18:22:17 +0800 |
Hi Elias,
yes - actually a lot. I haven't looked through all files, but
at 80, 60, and small core counts.
The good news is that all results look plausible now. There are some variations
in the data, of course, but the trend is clear:
The total time for OMP (the rightmost value in the plot, i.e. x == corecount + 10) is consistently
about twice the total time for a hand-crafted fork/sync. The benchmark was made in such way
that it only shows the fork/join times. Column N ≤ corecount shows the time when the N'th core
started execution of its task.
I have attached a plot for the 80 core result (4 hand-crafted runs in red and 4 OMP runs in green).
And the script that created the plots using gnuplot.
/// Jürgen
On 08/01/2014 03:16 PM, Elias Mårtenson wrote:
Were you able to deduce anything from the test results?
On 11 May 2014 23:02, "Juergen Sauermann" <address@hidden> wrote:
Hi Elias,
thanks, already interesting. If you could loop around the core count:
for ((i=1; $i<=80; ++i)); do
./Parallel $i
./Parallel_OMP $i
done
then I could understand the data better. Also not sure if something
is wrong with the benchmark program. On my new 4-core with OMP I get
fluctuations from:
address@hidden ~/apl-1.3/tools $ ./Parallel_OMP 4
Pass 0: 4 cores/threads, 8229949 cycles total
Pass 1: 4 cores/threads, 8262 cycles total
Pass 2: 4 cores/threads, 4035 cycles total
Pass 3: 4 cores/threads, 4126 cycles total
Pass 4: 4 cores/threads, 4179 cycles total
to:
address@hidden ~/apl-1.3/tools $ ./Parallel_OMP 4
Pass 0: 4 cores/threads, 11368032 cycles total
Pass 1: 4 cores/threads, 4042228 cycles total
Pass 2: 4 cores/threads, 7251419 cycles total
Pass 3: 4 cores/threads, 3846 cycles total
Pass 4: 4 cores/threads, 2725 cycles total
The fluctuations with the manual parallel for are smaller:
Pass 0: 4 cores/threads, 87225 cycles total
Pass 1: 4 cores/threads, 245046 cycles total
Pass 2: 4 cores/threads, 84632 cycles total
Pass 3: 4 cores/threads, 63619 cycles total
Pass 4: 4 cores/threads, 93437 cycles total
but still considerable. The picture so far suggests that OMP fluctuates much
more (in the start-up + sync time) than manual with the highest OMP start-up above manual
and the lowest far below. One change on my TODO list is to use futexes instead of mutexes
(like OMP does), probably not an issue under Solaris sunce futextes are linux-specific.
/// Jürgen
On 05/11/2014 04:23 AM, Elias Mårtenson wrote:
Here are the files that I promised earlier.
Regards,Elias
[Prev in Thread] | Current Thread | [Next in Thread] |