freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Patch: Speed Pooma Evaluations


From: Jeffrey Oldham
Subject: Patch: Speed Pooma Evaluations
Date: Mon, 17 Dec 2001 16:50:38 -0800
User-agent: Mutt/1.2.5i

1. This patch permits the KCC optimizer to move most of non-data array
accesses out of the inner loop.  By using local variables rather than
references, the optimizer can determine that the inner loops'
assignments do not change the local objects' data members, e.g.,
strides_m.  Thus, these values need not be reloaded inside the inner
loops so the code is similar to hand-coded C loops.  In effect, the
compiler can determine which data members are loop-invariant and which
are not.  Example execution time (seconds) for Linux/KCC's Doof2d with
N=1000 include
 
                  C     Brick   FieldBrick stencil Brick
before change     6.20  9.89    13.29      7.29
after change      6.21  7.48    7.44       7.16
 
2. The idea can be implemented in at least two different ways.
Stephen Smith suggested the idea for the attached patch.  It relies on
two assumptions:
1) cheap, shallow copies and
2) copies of all non-pointer data members.

3. In Mark Mitchell's suggested implementation, container and engine
data members that are invariant during loop iterations are explicitly
stored in LoopInvariant_t structures.  These constant structures are
constructed before the loop and passed to the reads and writes within
the loop.  These operations use the constant structures rather than
the containers' and engines' data members.  Thus, the optimizer can
determine that the uses of the constant data members can be hoisted
out of the loops.  Although this implementation can deliver cleaner
code since we, as smart humans, might be able to determine better
code, it requires much more programmer time and code.  We can always
implement the idea if needed.  A patch for part of the work is
attached.

4. Two other sets of loops could be sped up using a similar technique
but were not.

a. Evaluator/LoopApply.h uses a function object.  Since we do not know
   whether we can copy the object much less copy back into the
   original, I do not know how to transform the loops.

b. Engine/RemoteEngine.h's EngineBlockSerialize could be modified but
   I could not find any user code to confirm the transformation's
   correctness.

Thanks to Mark Mitchell for finding the idea and creating the
technique.  Thanks to Stephen Smith for finding the slicker
implementation.

2001-11-02  Jeffrey D. Oldham  <address@hidden>

        * InlineEvaluator.h
        (KernelEvaluator<InlineKernelTag>::evaluate() for Dim=1..7:
        Use local variables for the left-hand side and the right-hand
        side.  This permits the KCC optimizer to move loop-invariant code
        out of the innermost loop, significantly reducing running times.
        * ReductionEvaluator.h
        (ReductionEvaluator<InlineKernelTag>::evaluate() for Dim=1..7:
        Use local variables for the expression and the accumulator
        variable.  This permits the KCC optimizer to move loop-invariant
        code out of the innermost loop, significantly reducing running
        times.

Tested on       Linux/KCC by compiling Doof2d and running all the
                array regression tests.  Only the inner loops of
                Doof2d and src/Evaluator/tests/ReductionTest1 were
                investigated.  (Doof2d compiled 17Dec using LINUXgcc
                --opt.)
Approved by     Stephen Smith
Applied to      mainline

Thanks,
Jeffrey D. Oldham
address@hidden

Attachment: LoopInvariant.02Nov.15.8.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]