Plan for Reducing Pooma's Running Time

freepooma-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Plan for Reducing Pooma's Running Time

From:	Jeffrey Oldham
Subject:	Plan for Reducing Pooma's Running Time
Date:	Tue, 14 Aug 2001 20:46:44 -0700

We soon will have several geographically disparate people working on
reducing Pooma's running time.  Having a task list would be useful to
ensure we reach our goal of producing fast code.  First, I outline a
strategy and then a list of tasks.

Goal:

Make Pooma code run faster than corresponding ForTran or C code.


Measurements:

Measurement is our most important task.  We should revise Pooma code
only if it, in practice, reduces running time.

To permit comparisons between executions of different programs, we can
compute the abstraction ratio for using data-parallel or Pooma2
abstractions.  The abstraction ratio is the ratio of a program's
running time to the corresponding C program's running time.  We want
this ratio to be at most one.

We need to resolve timing granularity problems and which time
measurements to make, e.g., wall-clock or CPU time.


Infrastructure:

We should establish daily builds and benchmark runs to check that
running times do not increase while we try to reduce running times.
Running times on both Irix and Linux is desirable.  We'll use QMTest
to perform the testing.

Question: Should we post these reports on a daily basis?

We should use KCC.  Some preliminary performance indicates that KCC's
and gcc's performances differ.  Tools to profile the code include
Linux's gprof (instructions available in info pages and
http://sources.redhat.com/binutils/docs-2.10/gprof.html) and Irix's
ssrun(1) and prof(1).

Question: Are there other profiling tools?


Work:

Scott Haney suggests first speeding Array execution since (New)Fields
use Arrays.  A good initial step would be checking that the KCC
optimizer produces code similar to C code without Pooma abstraction
overheads.  First, we can compare C-style arrays with Brick Array
engines on uniprocessor machines.  Then, we work with multipatch Array
engines, trying to reduce the overhead of having multiple patches.
Trying various patch sizes on a uniprocessor machine will demonstrate
the overhead of having multipatches.  We'll defer threaded and
multi-processor execution to later.

Stephen will soon post a list of Array benchmarks, what they test, and
what they do not test.  We can write additional programs to fill any
deficiencies in our testing.  Each individual researcher can speed a
benchmark's execution.

Work on the NewField should be delayed until Stephen Smith and I merge
our work into the mainline.  Currently, there is one benchmark program
benchmarks/Doof2d that use NewField.h.  We also will have the Lee et
al. statigraphic flow code.  Are these sufficient for testing?  If
not, should we write more test cases?  Will we want to finish the
Caramana et al. hydrodynamics program?

Question: Who besides Jeffrey has access to a multi-processor computer
with more than a handful of processors?

Question: Do we need to check for memory leaks?  Bluemountain has
purify, which should reveal leaks.  Perhaps we can modify the QMTest
scripts to ease checking.

Procedure for Modifying Pooma Code:

Even though we'll probably work on a separate development branch, we
need to ensure that the Pooma code compiles at all times to permit
multiple programmers to work on the same code.  Before committing a
code change,

1. Make sure the Pooma library compiles with the change.  Also check
   that associated executables still run.
2. Obtain patch approval from at least one other person.
3. Commit the patch.
4. Send email to address@hidden, listing
  a. the changes and an explanation,
  b. the test platform, and
  c. the patch approver.


To Do List:

o Complete this list.
o Add this list in the Pooma CVS tree for easy sharing and modification.
o Describe the existing benchmarks.   Stephen
o Determine what execution tasks are not covered by existing code.      Stephen
o Determine interesting benchmarks using Arrays.
    Stephen recommends starting with benchmarks/Doof2dUMP.      Gaby?
o Establish nightly Pooma builds for Linux and Irix, producing summary
  reports.  Jeffrey
o Ensure Pooma compiles with the threads package.       Jim?

Thanks,
Jeffrey D. Oldham
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Plan for Reducing Pooma's Running Time, Jeffrey Oldham <=

Prev by Date: Speeding Pooma Code on a Branch?
Next by Date: [newfield_revision Patch] Revise nearestNeighbors
Previous by thread: Speeding Pooma Code on a Branch?
Next by thread: [newfield_revision Patch] Revise nearestNeighbors
Index(es):
- Date
- Thread