gomp-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gomp-discuss] Somethings to think about ....


From: Lars Segerlund
Subject: Re: [Gomp-discuss] Somethings to think about ....
Date: Mon, 10 Mar 2003 15:53:24 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9


Steven Bosscher wrote:
Hi Lars, all,


Op ma 10-03-2003, om 13:44 schreef Lars Segerlund:

I have also been looking at the linux support for MP, and NUMA ( which was added lately ), and linux does support affinity and NUMA in the latest kernels, however to take maximum advantage of this it would be quite resonable to do a native port to linux ( using clone instead of any thread library ), and only implement the synchronization element's needed by openMP.


I think our goal should be to make GCC concurrency-aware and use it for
OpenMP with threads as a first application.  Everything else
(autoparallelization, NUMA-awareness, grid computing, the construction
of HAL, what else?) is beyond the scope of this project.

If you want to make stuff linux-specific, you'd have to stuff it down
the throats of the GCC community with force and violence to make them
accept your contribution and like it.


I didn't want to make stuff linux specific, I wanted to keep the possibility open of adding machine specific implementations later for efficiency, I do realize that it might hace sounded 'bad'.

Besides, load balancing, scheduler affinity and other low-level SMP/NUMA
stuff is the kind of stuff that a kernels is responsible for.  OpenMP is
not designed for such purpose.

(You can distribute tasks over clusters of CPUs with HPF2 (DISTRIBUTE,
BLOCK, etc.), that gives you some control of how your job will run on a
NUMA machine.  But it's not very portable and as a developer you need to
know all the ins and outs of the machine you're targeting.  I suppose
this explains why I've seen only a few HPF applications that use this
feature...)
Still the first thing to do is to get openMP running with a threading library, and perhaps ( if smp safe ) a semaphore library.

As for the tasks ahead, I think it's not to hard to use the framework in the paper to target the GENERIC tree's ( which is the most resonable form to target IMHO ). The algoritms for an rather good implementation seem's to all be there, and the nice part is that if we extend the pragma handling and add a -fgomp to gcc, we should be able to leave most of the regulat stuff in place.


I would prefer -fopenmp :-)


 Why do you come up with all the good names :-) ... I do agree.



I do however have a question, I know gcc does support barriers, but to what extent and in what context ?


What do you mean with "GCC supports barriers"?

The only barriers I know of in GCC are BARRIER insns in RTL.  In that
context, a BARRIER basically is just the marker of the end of a code
block (e.g. after an unconditional jump_insn).  In other words, it
states: "Control flow ends before this".  It is used for code alignment
(i.e. the insn following the BARRIER can be aligned) among other things.

This has nothing to do with barriers they talk about in the OpenMP
specs; that one synchronizes threads.


This I am aware of, however I was not aware of what they were used for in gcc thus the question about them. Also I was not aware that they were on the RTL level, ( which makes them uninteresting for us ).



As far as I understand it it supports barriers which prevents sections of code to be handled together, ( tus enforcing separate optimization ). I'm still looking, but does anybody know if this is correct ?


Well, barrier really are just markers for places where there is one and
only one out-edge in the control flow graph.  That does not necessarily
imply that all the optimizers stop there.

For example, after expanding trees to RTL, you'll see that the dump file
is littered with BARRIERs all over, but after some basic flow graph
optimizations (jump!), most of them are gone.  And if that single edge
before the barrier is a back-edge, the loop optimizers use them to
identify loops.  And crossjumping is all-barriers?  In these case,
barriers _allow_ the compiler to identify optimization opportunities!


Thanks, I have looked at this a bit now, and I don't claim to understand it fully, but I see what they do now. I'm just desperately looking for a mechanism in place to restrict optimizer scope, but I figure we do have to make them parallell aware insted.



I thought that we might as well start documenting what we want to do with gcc, the trees and what we have to modify.


Do you have a plan we can discuss on some mailing list?

Greetz
Steven


 As for a plan, do you mean something concrete ?

I would then suggest that we investigated what is needed to enhance GENERIC enough to support the form of Diego's paper, since there is a set of algorithms to support this.

I was more thinking that we could have a discussion about what the plan should be :-) ... since we don't have a plan yet.

 Basicly I think along these lines,

1. the lib is trivial to do, and a stub might as well be enough to enable other areas of work to progress.

2. the tree modifications are not that hard, but have to be carefully planned in order to be efficient and extendable. Still they have to be done.

3. the algorithms used for the concurrency can be tested on the tree's when these are done, without a proper front and backend, this might even be a very nice thing to do in order to get some proper testing done. If this phase is basicly 'bug' free I think a lot of later work is spared.

4. at this point it should be about time to figure how to interface with gcc n the most 'non intrusive' manner. And as I understand it, there could be two routes to this, the first is to make gcc ignore ( remove ) the parallell part's of the tree if not -fopenmp is given, and the second is to enable the extra 'concurrency aware' code if the -fopenmp ( or replace parts of gcc with concurrency aware code ). ( I don't know if it get's through what I mean, but it's basicly a tightly knit implementation vs. a loosely knit implementation ).

5. When this is done, it would be reasonable to start doing the code generation. ( I haven't given this any thoughts ).

6. Front end work making gcc take advantage of the parallell trees should be the last thing to get the compiler working, and at this point we should have a working implementation.

So I should think we would need a specification of what to do with the trees and what we need to represent, from there we only have to code a lot. ( :-D ).

 / Lars Segerlund.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]