[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Threads vs. OpenMP in PSPP.
From: |
John McCabe-Dansted |
Subject: |
Re: Threads vs. OpenMP in PSPP. |
Date: |
Wed, 4 Apr 2007 21:54:25 +0800 |
On 4/2/07, Ben Pfaff <address@hidden> wrote:
If we implemented all of these, or even just through #3, I bet
PSPP would be one of the best things out there for dealing with
large amounts of data.
Another approach would be to split of the code that thats the majority
of the cpu time and any or all of the following
1) Compile that code with -march="" and link in the code for the
appropriate CPU.
2) Create a benchmark and
2a) Tune the compiler optimisations for each algorithm using Acova.
2b) Pick the compiler that generates the fastest code for that algorithm.
(2a) Can improve overall performance by 22%, although this included
fast-math, which typcially actually increases accuracy as well as
speed, but means that different CPUs might generate slightly different
results.
http://www.coyotegulch.com/products/acovea/aco5k8gcc40.html
(2b) In rare case can improve performance up to 6x, see e.g.
http://www.coyotegulch.com/reviews/linux_compilers/index.html
ICC 8.1 performed Monte Carlo integration roughly 6 times faster than
gcc. If ICC can increase performance of some PSPP algorithms by a
similar margin, we may wish to compile those bits using ICC. Better
yet would be to get the same optimisations into gcc so AMD chips can
get a similar increase performance.
Even just picking the version of gcc that produced the fastest code
would boost speed by as much as 10% over any fixed version of gcc. I
don't know much this reflects a genuine speed boost or just
microtuning to the benchmark. Microtuning to a particular CPU could be
benificial if we produce a different binary for each CPU.
If this help I have access to Pentium3 Pentium4(Prescott) Sparc
PowerPC Duron and Athlon-XP CPUs. We could also have "make acovea" so
each user could choose to spend days of CPU time tuning PSPP to their
particular CPU. If they posted their results it could make an
interesting reading.
--
John C. McCabe-Dansted
PhD Student
University of Western Australia