[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freepooma-devel] [PATCH] PathScale EKOPath compiler support
From: |
Richard Guenther |
Subject: |
Re: [Freepooma-devel] [PATCH] PathScale EKOPath compiler support |
Date: |
Thu, 17 Mar 2005 09:33:14 +0100 (CET) |
On Wed, 16 Mar 2005, Bryan O'Sullivan wrote:
> On Wed, 2005-03-16 at 09:22 +0100, Richard Guenther wrote:
>
> > Thanks, I added this to the HEAD and the r2 branch. I'm curious,
> > do you have any numbers for the benchmarks like ABCTest/BlitzLoops and
> > Doof? I don't have pathscale compilers available here, and we don't have
> > amd64 systems here anyway.
>
> I'm starting to run the benchmarks at the moment. I've never seen any
> benchmark numbers for other compilers or systems, though, so I wouldn't
> know which numbers are good or bad. Can you point me at some? The
> more, the better, especially OpenMP and MPI.
Well in all the shipped benchmarks you'll get output like
(BlitzLoops):
BlitzLoops> ./LINUXgcc/Loop18 --no-diags
C
N restrict C CppTran PoomaII
100 1461.20 1428.96 982.61 253.63
215 1609.77 1609.77 1300.28 432.08
464 1779.71 1544.34 1396.56 557.08
1000 1576.69 1576.69 1105.65 715.73
2154 1656.58 1570.21 1111.69 819.32
4641 1549.80 1494.28 965.87 786.17
10000 1521.87 1463.46 917.38 662.33
21544 441.18 509.31 596.76 460.04
46415 261.34 268.60 273.99 253.92
100000 267.93 281.90 277.57 263.16
where the ultimate goal is to have CppTran and PoomaII numbers
be equal or better (higher - these are sort of MFLOPs) than
the C and C restrict numbers. Here we assume that the compilers
are already very good at optimizing the C code, which is usually
true.
Note that the comparison is not fair in all tests as sometimes
the Pooma tests do split loops (which of course you could fuse
theoretically).
These tests are also good for testing OpenMP performance, not
for MPI performance (though that does not depend on the compiler
too much anyway).
Be sure to play with the --sim-params parameter to get a
reasonable problem size that covers L1/L2 and main memory.
Most interesting are the numbers for problem sizes that still
fit in L2 cache, as here you can clearly see the abstraction
penalty most.
>From a good compiler I expect the C and CppTran performance
numbers to match and the PoomaII numbers match at least for
the main memory sized problems.
If you're working agains CVS HEAD of FreePOOMA there is some
library level optimization for unit stride access, this may
help optimize the innermost loops.
The numbers above for BlitzLoops are for gcc-3.4, like the
following for ABCTest:
ABCTest> ./LINUXgcc/ABC --no-diags --run-impls 0 1 2 3 --sim-params 10 2 3
C CppTran PoomaII
N restrict C Bk Bk
10 953.09 953.10 513.33 105.56
21 1129.77 1167.20 563.80 190.12
46 1076.90 1089.12 564.65 225.16
100 1098.92 1064.97 536.62 245.19
215 278.22 268.69 261.83 179.41
464 269.99 274.75 271.01 187.12
1000 264.26 270.93 255.07 163.58
and Doof3d:
Doof3d> ./LINUXgcc/Doof3d --no-diags --sim-params 10 1 2
C PoomaII PoomaII
N restrict C CppTran NoOpt Opt
10 624.94 628.32 82.83 113.32 82.34
31 638.89 645.34 82.88 154.41 83.11
100 644.08 635.40 82.44 155.17 81.93
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/