gomp-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gomp-discuss] Plan ... coments wanted !


From: Steven Bosscher
Subject: Re: [Gomp-discuss] Plan ... coments wanted !
Date: 30 Jan 2003 09:29:22 +0100

Op do 30-01-2003, om 01:35 schreef Diego Novillo:
> On Thu, 30 Jan 2003, Steven Bosscher wrote:
---- 8< ----
> > We would have to update the OpenMP information to something like
> > "private(normalized_loop_index)".  IMO optimizers shouldn't have to do
> > that.
> > 
> Yes, they do.  That #pragma you added to the code is telling the
> optimizers hints on the semantics of k, x, y and z.  In this
> case, the optimizer knows that there will be many copies of 'k',
> so it can do whatever it wants with it.  The code generation will
> store 'k' in TLS (thread local storage).  Things are different
> with x, y and z.  Those need to be left alone (unless they are
> protected in synch regions, which concurrent SSA optimizers
> should be able to handle).
> 
> > Maybe that's why intel handles OpenMP directives *before* high level
> > optimizations?
> > 
> They probably don't have concurrency-aware optimizers.  We
> should.

Isn't that maybe a bit too much of a good thing?  How much benefit do
you expect from that, compared to the amount of extra work involved?

I looked at the OdinMP (http://vvv.it.kth.se/labs/cs/odinmp/) results. 
That is a "C to C-with-pthreads" compiler that takes OpenMP C source
code and parallelizes it.  It's a really simple compiler, no
optimizations at all.  So it's really just a preprocessor.

And the results look very good!

The generated code was tested on an SGI origin for up to 8 CPUs.  They
compared:
- cc-omp: SGI CC with OpenMP support enabled
- cc-odinmp: SGI CC with OdinMP-ized source code
- gcc-odinmp: GCC with OdinMP-ized source code

In all cases ([1-8] CPUs), gcc-odinmp achieved the about the same
speedup as cc-omp did (sometimes a bit better).  The parallel region
overhead is about twice as high for gcc-odinmp (e.g. preprocessed) than
it was for cc-omp (OpenMP in the compiler), but it would probably be not
too hard to improve that.  It's really not as important as efficient
parallelization anyway. The report describes the code transformations
OdinMP does in some detail.

Such results would suggest that parallelizing *before* we lower GENERIC
to GIMPLE might give us good results too.

Greetz
Steven






reply via email to

[Prev in Thread] Current Thread [Next in Thread]