freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freepooma-devel] [PATCH] Speed up evaluators


From: Richard Guenther
Subject: [Freepooma-devel] [PATCH] Speed up evaluators
Date: Tue, 30 Nov 2004 13:21:14 +0100 (CET)

Hi!

After analyzing the cause of performance loss between C
and PoomaII implementations for the various benchmarks available,
the following patch was created.

There are two main reasons for the performance loss (consider A = B+C):

 - compilers will never be able to prove that strides for A,B and
   C are the same - this causes 3-times as complicated index
   arithmetic as for the C case.
 - with Interval<> views we have strides[0] == 1 - but neither we
   do, nor the compiler can explore this fact (and we _always_
   generate views for expressions).

To address the second reason (keeping in mind that Interval<>
views are the most common), we can create a second variant of
Engine<,,BrickView> - namely Engine<,,BrickViewU> that gets used
for Interval<> (and related) views.  This improves performance
a lot, as it simplifies the innermost loop of the expanders:

before:
./LINUXgcc/ABC --no-diags --sim-params 100 1 2 --samples 3 --run-impls 0,
1, 2, 3
              C                        CppTran       PoomaII
N          restrict         C             Bk            Bk
100        1057.07        1113.25        591.91        242.01
316        270.39        275.07        278.36        172.35
1000       268.43        270.80        287.02        170.75

after:
./LINUXgcc/ABC --no-diags --sim-params 100 1 2 --samples 3 --run-impls 0,
1, 2, 3
              C                        CppTran       PoomaII
N          restrict         C             Bk            Bk
100        1076.11        1028.53        584.53        470.38
316        270.70        284.94        280.75        180.59
1000       271.91        280.09        281.26        183.96


That's a 100% improvement for the cache-dominated case.  Further
improvements need to address to CppTran vs. C losses, as PoomaII
can never be better than CppTran in this test.

The patch does not cause any testsuite-regressions, but I may have
missed updating some traits.  Patch queued for post-r2 (and personal use).

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

Attachment: patch-brickviewu
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]