swarm-modeling
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Swarm-Modelling] ABMs on Graphical Processor Units


From: Russell Standish
Subject: Re: [Swarm-Modelling] ABMs on Graphical Processor Units
Date: Sat, 29 Dec 2007 13:33:12 +1100
User-agent: Mutt/1.4.2.3i

On Fri, Dec 28, 2007 at 05:47:08PM -0700, Marcus G. Daniels wrote:
> There's a lot of machinery in OpenMPI that gets pulled no matter what, 
> owing in part to multiple abstraction layers, including a component 
> model.  Perhaps MPICH would be easier to strip down, but even with 
> static linkage it was clear to me it wasn't going to be a < 64k (and say 
> another 64k for heap) which is basically what you'd want in order to 
> keep it resident on the local store.  (Keep in mind you want some local 
> store to do real work and there is only 256kb per SPU.)    It is 
> possible, using the latest GCC, to build a library for the Cell SPU into 
> overlays, and have callers automatically tickle the overlay they need.   
> When a different overlay is needed, that means pulling it over the DMA.  
> (Not much different cost than evicting something from L2 cache, but 
> nonetheless a cost only experienced programmers even recognize.)

It sounds to me that the Cell is basically equivalent to a 9 core
conventional SMP setup, with "local memory" being essentially the same
as the L2 cache. 256KB is a little on the measely side these days, as
even commodity IA can stump up a couple MB of cache, but its certainly
possible to be used, as pevious generations of IA only had this much
cache. Then it doesn't matter if MPI is kept in local store - only the
bits of the program heavily used will matter, and typically MPI is at
its most successful when communication costs are much less than
computation costs.

The main complication appears to be the difference in
architecture between the frontend processor and the back end
processors (did you call these PPU and SPUs?). That is not an issue
for MPI jobs, which can be heterogenous, or you can just dedicate the
front end processor to doing O/S related work, and leave the
computation to the back end CPUs (This is mostly how it was done on
the CM5, which also had a difference between frontend and backend
CPUs).

> 
> The problem of keeping any serial processor busy, is one of keeping 
> calculations close to their memory (or other blocking operations like 
> I/O).   The reality is if we don't do that, or fail to tolerate latency 
> with built-in parallelism, then we're wasting compute cycles anyway.   
> DDR will never be as fast as a register.  And we can't just wave our 
> hands and make all problems inherently parallel.  I suppose one could 
> wish that SPU's would each have 24MB local stores like a high-end 
> Itanium.  By my calculations that would be about 12 billion transistors.

Of course - it is 90% of the art of creating HPC applications.

> 
> As a datapoint, Sony's distributed [protein] address@hidden PS/3 network 
> hit a petaflop a few months ago.  They started from the standard Gromacs 
> codebase and started reworking and optimizing..   It soon overshadowed 
> the PC address@hidden network..  
> http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats
> 
> Anyway, my point is not to push the Cell, but to say that GPUs and Cell 
> processors, vector units, microprocessors all have tradeoffs.  None of 
> them give parallelism where it can't be proven from the code or exists 
> as an obvious part of the algorithm.  
> 
> Marcus
> 
> 
> _______________________________________________
> Modelling mailing list
> address@hidden
> http://www.swarm.org/mailman/listinfo/modelling

-- 

----------------------------------------------------------------------------
A/Prof Russell Standish                  Phone 0425 253119 (mobile)
Mathematics                              
UNSW SYDNEY 2052                         address@hidden
Australia                                http://www.hpcoders.com.au
----------------------------------------------------------------------------


reply via email to

[Prev in Thread] Current Thread [Next in Thread]