gomp-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gomp-discuss] Plan ... coments wanted !


From: Diego Novillo
Subject: Re: [Gomp-discuss] Plan ... coments wanted !
Date: Wed, 29 Jan 2003 19:35:07 -0500
User-agent: Mutt/1.4i

On Thu, 30 Jan 2003, Steven Bosscher wrote:

> Op do 30-01-2003, om 00:23 schreef Diego Novillo:
> > On Thu, 30 Jan 2003, Steven Bosscher wrote:
> > 
> > > Op wo 29-01-2003, om 15:50 schreef Diego Novillo:
> > > > You really want to work on GIMPLE.  That's the language over
> > > > which GCC will do tree optimizations.
> > > ---- 8< ----
> > > > As far as the optimizations go, almost everything will be
> > > > analyzed and optimized at the GIMPLE level.
> > > 
> > > So where should the OpenMP stuff be translated to libcalls and who knows
> > > what else?  During GENERIC->GIMPLE, or during GIMPLE->RTL?
> > > 
> > GIMPLE->RTL.  GENERIC->GIMPLE will probably be just a
> > simplification of the expressions much like we simplify the
> > original parse trees.
> 
> I'm still not convinced that is possible. Now you're the CS PhD here, so
> I must be misunderstanding something... Hope you can explain then.
> 
> For OpenMP we need to keep track of where variable are, because most
> directives can explicitly specify what should happen with a given
> variable. With all the cool SSA optimizations, loop normalization, etc,
> how can we make sure we still have all the information we need when we
> parallelize the code?
> 
By adding smarts to the SSA optimizers.  If I may be so bold as
to plugging my own work, that's what I spent a few years doing
back in the late '90s.  It's insanely boring, but you're welcome
to browse http://people.redhat.com/dnovillo/Papers/ for details.

> For example, consider a small modification to the snippet from the Intel
> article (s/k/k+1/).  Not a very bright piece of code, but for sake of
> argument:
> 
> #define N 10000 
> void
> ploop(void) 
> {
>   int k, x[N], y[N], z[N]; 
>   #pragma omp parallel for private(k) shared(x,y,z) 
>   for (k=1;  k=<N; k++) {
>     x[k-1] = x[k-1] * y[k-1] + workunit(z[k-1]);
>   }
> }
> 
> Now, if we first do GIMPLE+Optimizations, the for-loop will be
> normalized.  Then all the [k-1]s would be replaced with the index
> variable for the normalized loop with PRE/CCP.  Et voila, k is dead and
> is probably eliminated by DCE(?).
> 
Nope.  We annotate GIMPLE and then when we build a concurrent SSA
form to analyse the parallel sections.  The GIMPLE trees that
mark parallel regions in the code need to be analyzed and
optimized using different techniques than the traditional
sequential SSA transformations.

> We would have to update the OpenMP information to something like
> "private(normalized_loop_index)".  IMO optimizers shouldn't have to do
> that.
> 
Yes, they do.  That #pragma you added to the code is telling the
optimizers hints on the semantics of k, x, y and z.  In this
case, the optimizer knows that there will be many copies of 'k',
so it can do whatever it wants with it.  The code generation will
store 'k' in TLS (thread local storage).  Things are different
with x, y and z.  Those need to be left alone (unless they are
protected in synch regions, which concurrent SSA optimizers
should be able to handle).

> Maybe that's why intel handles OpenMP directives *before* high level
> optimizations?
> 
They probably don't have concurrency-aware optimizers.  We
should.


Diego.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]