|
From: | John Hall |
Subject: | Re: [pooma-dev] Sparse Engine |
Date: | Fri, 26 Sep 2003 03:33:01 -0600 |
Richard:The idea is to get rid of the loop over materials found in our previous iterations of our code. In fact we simply need to compute cell-average quantities for the mixed cells and then perform a single data-parallel computation over the single mixed material field which does the work for all materials at once. So we do a little unstructured work to compute the cell average quantities and then we do a data parallel computation. There are some other advantages that accrue to the use of this unstructured approach that would allow us to store some information that would normally be too expensive to store, but, we won't go into that here.
The complication is that we want to (in the grand tradition of POOMA) hide the underlying complexity of our storage scheme and make things appear beautiful and logically simple. A good example might be using a storage scheme for a symmetric matrix that only stores an upper triangular matrix, but, that allows you to access any index into the array and it internally maps the indices into the correct storage location.
In our example, the index array is positive for a pure cell and simply is the material ID for the material contained in that cell. If the index array contains a negative value, then it has traditionally been an index into an unstructured linked-list of the mixed cells data. We can then access this data and compute a cell-average value which we store in that cell of the multi-material field and then we perform our data-parallel operations on that multi-material field.
We occasionally need to gather all of the pure and mixed material values of a single material so that we can do a single-material calculation like an EOS evaluation, so that is why we want the work array (which we compress/deallocate when we are not using it). So the various views of the data that we would take are the multi-material cell average view, the gathered single material view and the overall complicated-storage scheme view. To get the kind of performance the old code has we will also need to introduce windowing and activity flags. Basically, we are attempting to throw away any unnecessary computations and minimize the data we are pushing through cache.
The sparse tile layout doesn't have the concept of indirect addressing using an index field. It is simply intended for block-AMR type meshes. If we do AMR it would probably be a completely unstructured problem in which any cell can be refined, rather than a block type. Unfortunately, this again introduces the possibility of all-to-all communications (slow) to find your neighbors, etc.
We have also been dealing with the issue of how best to do masking. I am beginning to think that we need another sparse storage idea so that we end up with something equivalent to a block where in which the data is collected into lists using a test and the computation is done simply over that collection, which gets progressively smaller as the number of tests increases. Currently, when using a mask, you end up traversing all of the data, maybe even doing the computation everywhere and then simply throwing away the result where the mask is not set (either by a conditional or by multiplying by 0.0). Building the list for extremely sparse data can be a huge win. Like I said, the old version of this algorithm is running 20 times faster than the data-parallel version. This is only possible by simply doing less work.
We would also like to have exterior guards on a box with a lot of very little logically distinct but shared memory patches without guard cells within the box. Then we could maybe achieve some reasonable compression and our computations should approach AMR storage schemes without the undesired Gibb's phenomenon due to poor impedance matching across T-joints that AMR has.
I should note that we are aware of the issue of not using certain types of dynamically allocated data structures because the guard cell copy scheme might only move the pointer to the data and not the actual data. We are taking this into account.
Hope this helps, John Hall On Friday, September 26, 2003, at 02:07 AM, Richard Guenther wrote:
Ok, still trying to understand - this is something like (statically) specifying which cells participate in computation? Like you would have a usual brick engine in conjunction with a bitfield specifying a mask and using this in the evaluator loop (of course this would be less memory efficient)? So this would be a cheap way to do this compared to using the sparse tile layout? Thanks, Richard. On Fri, 26 Sep 2003, John H.Hall wrote:Richard: OK. Here goes. The basic idea is that we have a hierarchical field structure (built using hierarchical engines similar to the current multi-material field abstraction) which has a collection of 1-D dynamicFields (for the sparse unstructured storage), a shared Index (n-D) integer Array (or Field), and a single (n-D) scalar, vector or tensor field which has either the data for a pure cell, or a cell average value for mixed-material cell's data. As the problem evolves the material interfaces migrate and so the actual position of theunstructured cells changes. However, all the indirect indexing is stilllocal to the processor (except for normal guard cell communications). So this is much simpler than a real unstructured problem with all-to-all communications. In the general case, the sparse dynamic fields are only used to compute the cell-average quantities before a data-parallel computation across the single multi-material or cell-average field is performed. We would also like to take some views of the field in which all of the data for a particular material is gathered/scattered to/from a single spare dynamic work Array that is shared in this hierarchical structure. Field<Mesh_t, Real, MixedCellEngine> would look like this: ___________________ |__________________| Single material Gather/Scatter 1-D Dynamic Work Array (both mixed and pure cells) ______ |_____| mat A (1-D Dynamic Array/Field) _______ |______| mat B (1-D Dynamic Array/Field) ______ |_____| mat C (1-D Dynamic Array/Field)______________________________________________________________________ _ |_____________________________________________________________________ _|Cell Average Quantities (n-D)______________________________________________________________________ _ |_____________________________________________________________________ _|Integer Index Array (n-D) Single Index Array is shared by all Sparse Fields (e.g. Density, Pressure, etc.). This shares duty between providing the material index for a pure cell and an offset into a collection tracking the unstructured mixed cell data for a mixed cell. Multi-Patch should still work although the guard cell communications might be slightly more complicated. The number of cells which are indirectly addressed is very small (< 5% of the total) so even using compressible brick we are wasting a lot of memory bandwidth and performing numerous extraneous computations. A comparison code using this structure is running 20 times faster than the equivalent data parallel POOMA R1 computation for the single processor serial case. We believe we can match that performance bybuilding an engine that encapsulates the sparse nature reflected in theproblem and by making more use of the new engines POOMA R2 provides (stencil, etc.). Again, most of the computations are performed on the Cell-Average Quantities, so we just take a view, operator[]?, that returns that single field. John and Jean-- Richard Guenther <richard dot guenther at uni-tuebingen dot de> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
[Prev in Thread] | Current Thread | [Next in Thread] |