freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pooma-dev] Sparse Engine


From: John Hall
Subject: Re: [pooma-dev] Sparse Engine
Date: Fri, 26 Sep 2003 03:33:01 -0600

Richard:
The idea is to get rid of the loop over materials found in our previous iterations of our code. In fact we simply need to compute cell-average quantities for the mixed cells and then perform a single data-parallel computation over the single mixed material field which does the work for all materials at once. So we do a little unstructured work to compute the cell average quantities and then we do a data parallel computation. There are some other advantages that accrue to the use of this unstructured approach that would allow us to store some information that would normally be too expensive to store, but, we won't go into that here.

The complication is that we want to (in the grand tradition of POOMA) hide the underlying complexity of our storage scheme and make things appear beautiful and logically simple. A good example might be using a storage scheme for a symmetric matrix that only stores an upper triangular matrix, but, that allows you to access any index into the array and it internally maps the indices into the correct storage location.

In our example, the index array is positive for a pure cell and simply is the material ID for the material contained in that cell. If the index array contains a negative value, then it has traditionally been an index into an unstructured linked-list of the mixed cells data. We can then access this data and compute a cell-average value which we store in that cell of the multi-material field and then we perform our data-parallel operations on that multi-material field.

We occasionally need to gather all of the pure and mixed material values of a single material so that we can do a single-material calculation like an EOS evaluation, so that is why we want the work array (which we compress/deallocate when we are not using it). So the various views of the data that we would take are the multi-material cell average view, the gathered single material view and the overall complicated-storage scheme view. To get the kind of performance the old code has we will also need to introduce windowing and activity flags. Basically, we are attempting to throw away any unnecessary computations and minimize the data we are pushing through cache.

The sparse tile layout doesn't have the concept of indirect addressing using an index field. It is simply intended for block-AMR type meshes. If we do AMR it would probably be a completely unstructured problem in which any cell can be refined, rather than a block type. Unfortunately, this again introduces the possibility of all-to-all communications (slow) to find your neighbors, etc.

We have also been dealing with the issue of how best to do masking. I am beginning to think that we need another sparse storage idea so that we end up with something equivalent to a block where in which the data is collected into lists using a test and the computation is done simply over that collection, which gets progressively smaller as the number of tests increases. Currently, when using a mask, you end up traversing all of the data, maybe even doing the computation everywhere and then simply throwing away the result where the mask is not set (either by a conditional or by multiplying by 0.0). Building the list for extremely sparse data can be a huge win. Like I said, the old version of this algorithm is running 20 times faster than the data-parallel version. This is only possible by simply doing less work.

We would also like to have exterior guards on a box with a lot of very little logically distinct but shared memory patches without guard cells within the box. Then we could maybe achieve some reasonable compression and our computations should approach AMR storage schemes without the undesired Gibb's phenomenon due to poor impedance matching across T-joints that AMR has.

I should note that we are aware of the issue of not using certain types of dynamically allocated data structures because the guard cell copy scheme might only move the pointer to the data and not the actual data. We are taking this into account.

Hope this helps,
John Hall


On Friday, September 26, 2003, at 02:07  AM, Richard Guenther wrote:

Ok, still trying to understand - this is something like (statically)
specifying which cells participate in computation? Like you would
have a usual brick engine in conjunction with a bitfield specifying
a mask and using this in the evaluator loop (of course this would be
less memory efficient)? So this would be a cheap way to do this
compared to using the sparse tile layout?

Thanks,

Richard.

On Fri, 26 Sep 2003, John H.Hall wrote:

Richard:
OK. Here goes. The basic idea is that we have a hierarchical field
structure (built using hierarchical engines similar to the current
multi-material field abstraction) which has a collection of 1-D
dynamicFields (for the sparse unstructured storage), a shared Index
(n-D) integer Array (or Field), and a single (n-D) scalar, vector or
tensor field which has either the data for a pure cell, or a cell
average value for mixed-material cell's data. As the problem evolves
the material interfaces migrate and so the actual position of the
unstructured cells changes. However, all the indirect indexing is still
local to the processor (except for normal guard cell communications).
So this is much simpler than a real unstructured problem with
all-to-all communications. In the general case, the sparse dynamic
fields are only used to compute the cell-average quantities before a
data-parallel computation across the single multi-material or
cell-average field is performed. We would also like to take some views
of the field in which all of the data for a particular material is
gathered/scattered to/from a single spare dynamic work Array that is
shared in this hierarchical structure.

Field<Mesh_t, Real, MixedCellEngine> would look like this:
___________________
|__________________| Single material Gather/Scatter 1-D Dynamic Work
Array (both mixed and pure cells)
______
|_____| mat A (1-D Dynamic Array/Field)
_______
|______| mat B (1-D Dynamic Array/Field)
______
|_____| mat C (1-D Dynamic Array/Field)
______________________________________________________________________ _ |_____________________________________________________________________ _|
  Cell Average Quantities (n-D)
______________________________________________________________________ _ |_____________________________________________________________________ _|
  Integer Index Array (n-D)
Single Index Array is shared by all Sparse Fields (e.g. Density,
Pressure, etc.). This shares duty between
providing the material index for a pure cell and an offset into a
collection tracking the unstructured
mixed cell data for a mixed cell.

Multi-Patch should still work although the guard cell communications
might be slightly more complicated.

The number of cells which are indirectly addressed is very small (< 5%
of the total) so even using compressible brick we are wasting a lot of
memory bandwidth and performing numerous extraneous computations. A
comparison code using this structure is running 20 times faster than
the equivalent data parallel POOMA R1 computation for the single
processor serial case. We believe we can match that performance by
building an engine that encapsulates the sparse nature reflected in the
problem and by making more use of the new engines POOMA R2 provides
(stencil, etc.).

Again, most of the computations are performed on the Cell-Average
Quantities, so we just take a view, operator[]?, that returns that
single field.
John and Jean

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]