[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-apl] More performance results
From: |
Juergen Sauermann |
Subject: |
[Bug-apl] More performance results |
Date: |
Wed, 16 Apr 2014 19:56:07 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5 |
Hi,
I have created a benchmark program that measures the startup (fork) and
finish (join)
times of OMP. It also compares them with a hand-crafted fork/join.
The manual implementation uses a O(log(P)) algorithm for forking and
joining compared to
apparently an assumed O(P) algorithm in OMP. It would therefore be very
interesting if
Elias could run it on his 80-core machine. For my dual-core the
difference between both
types of algorithm should be minor.
The first run of both algorithms seemed to suggest hand-crafted version
is much faster
than OMP:
Pass 0: 2 cores/threads, 15330 cycles total (hand-crafted)
Pass 0: 2 cores/threads, 99197 cycles total (OMP)
But then came a surprise when I ran the benchmark loop several times in
a row:
./Parallel 2 (hand-crafted)
Pass 0: 2 cores/threads, 17542 cycles total
Pass 1: 2 cores/threads, 21070 cycles total
Pass 2: 2 cores/threads, 19075 cycles total
Pass 3: 2 cores/threads, 18249 cycles total
Pass 4: 2 cores/threads, 16415 cycles total
./Parallel_OMP 2 (OMP)
Pass 0: 2 cores/threads, 1213632 cycles total
Pass 1: 2 cores/threads, 5831 cycles total
Pass 2: 2 cores/threads, 2434215 cycles total
Pass 3: 2 cores/threads, 5705 cycles total
Pass 4: 2 cores/threads, 5215 cycles total
The details in the OMP case reveal that most of the time is spent on fork
(which is different from Elias' earlier results where join took the most
time.
Look a little like code-loading (shared lib?) might be the issue for OMP.
/// Jürgen
Parallel.cc
Description: Text Data
Makefile
Description: Text document