Re: Using OpenMP in Octave

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using OpenMP in Octave

From:	David Bateman
Subject:	Re: Using OpenMP in Octave
Date:	Fri, 02 Apr 2010 22:55:42 +0200
User-agent:	Mozilla-Thunderbird 2.0.0.22 (X11/20090706)

Jaroslav Hajek wrote:

I thought you would :) What if I won't like your constants, figuring
out better working ones for me? A commit war?

Not a chance you've committed much more than me recently ... You clearlyhave more time for fun stuff like Octave than me :'-(

I need to be able to turn off the multithreading at runtime and I
absolutely insist on that, so there needs to be a branch anyway.
The solution I offered earlier was to encapsulate the decision into a
routine, that will accept loop dimensions (number of cycles and the
"size" of one cycle) and decide whether it's worth parallelizing.
An universal switch point may be a good start. But if you expect
people to play with the parallel code, I don't see why you don't want
a tunable parameter for every operation and every mapper, if it can be
done efficiently.

Mainly that I think it might be inefficient to have a set of tunableparameters. Frankly the more I think about probably the best solution isti just say that the cases where the multithread code is slower than thesingle thread code are going to be the cases where their speed isn't anissue, and have a really stupid minimalist test and a flag to turn offmultithreading complete. Yes I think its a good idea to be able to turnit off, and not just with a flag when Octave is launched as matlabcurrently implements it


<cut proposed exception handling OpenMP code

Yes. Besides, I don't think this works.
I think you mixed together the interrupts thrown from Fortran that are
handled by setjmp/longjmp and regular C++ exceptions (which are the
case here). Here, one needs to employ try/catch. But see below, I
don't think this is necessary.

Ok, I didn't test the proposed code but threw it out for discussion..There's also a few typos and uninitialized values as well. I'd thoughtthe setjmp/longjmp stuff might be needed for mapper functions wriittenin fortran, but in fact I don't think so as the setjmp/longjmpexceptions will be rethrown as standard C++ exceptions within eachOpenMP thread.. I'll work up something else instead.

This question of the exception will occur even more as
we try to parallelize at a coarser grain and you probably have similar
issues in your parcellfun implementation already.


Not really. Remember that parcellfun does not use multithreading.

But what happens when one of your forked processed unexpectedly dies?

Are you sure that MKL and ACML both use OpenMP for multithreading? What
about others vendors? In any case wrapping all of this in a function to set
the number of threads in Octave should be easier. Note that Matlab
deprecated the maxNumCompThreads function

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/maxnumcompthreads.html
http://www.mathworks.com/access/helpdesk/help/techdoc/rn/bry1ecg-1.html#br1ask3-1

in recent versions of matlab exactly because libraries like BLAS and FFTW
are multithreaded. So we'd be going against the decision taken by matlab
because they consider they can't control the number of threads of these
libraries.


Unlike Matlab, Octave is open, so users can use BLAS of their choice.


LD_PRELOAD=/usr/lib/myfavblas.so matlab

works fine or at least used to when I was last able to test it. Youmight have to LD_PRELOAD the lapack as well if there are some unexpecteddependencies between the lapack and blas chosen by matlab. I used to dothings like this all the time when benchmarking octave and matlab toensure a fair comparison that was independent of the external dependencies.

No, they're not always multithreaded by OpenMP. For instance, GotoBLAS
is not. So what? It may be useful anyway, but we can simply say "this
function is equivalent to omp_set_num_threads, so it will only
influence libraries based on OpenMP".

Frankly I have no problems one way or the other about this, I was justpointing out that matlab deprecated this functionality and so we'd bemaking the opposite chose.. In general I like to keep Octave and matlabcompatible, but in this case as you say "so what".. Its inclusion inOctave won't affect the compatibility with matlab as its probably not afunction that I'd think to use within a script or function and even if Idid wrapping in an "if exist(OCTAVE_VERSION)" isn't a big issue

I think stuff like this has a far bigger impact than parallelizing
plus and minus. Besides, Octave's got no JIT, so even if you
parallelize plus, a+b+c will still be slower than it could be if it
was done by a single serial loop.
Not that it really matters; as I already said these operations are
seldom a bottleneck, and if they are, rewriting to C++ is probably the
way to go.

Yes but as you said yourself its a manner of communications. If someone does
a bench of a+b, stupid as that maybe be, I'd like Octave to be just as fast
as matlab. Kai's results showed that is quad core AMD improved performance
of the binary operators with multithreading. JITs another issue that yes the
cases where it makes the most difference aren't the ones that have the
largest impact on well written code. Though there are a lot of idoits out
there writing bad code and benchmarking Octave against Matlab and then
badmouthing Octave. So yes JIT would be good to have as well :-)


Maybe you place too much on idiots. Chasing Matlab in functionality is
useless; we can never win. There will always be idiots like that. Who
cares?

No its a case of evangelizing Octave.. Do a search for "octave" on thematlab newsgroup and see how many users there describe Octave as slow...I've even worked with people in the past that wouldn't use Octavebecause of their code was twice as slow in Octave, even though we onlyhad 10 matlab licenses for the whole group and I was running multipleinstances of Octave on many machines and getting results faster in anycase. With an attitude of "who cares", I'm sorry but mathworks hasalready won and we only continue to develop Octave for ourselves..

That being said the person who has most contributed to the speed ofOctave recently is you... I agree that getting the code that "matters"faster is more important than the gimmicks..

If its isolated and disabled by default I don't see why it shouldn't go into
3.4.0 to allow people to start playing with this code.. There seems to be
quite a few people regularly building and testing Octave now and letting
them have this functionality in an experimental form can only be a good
thinng, as long as of course it does put at risk the 3.4.0 release.


OK, I see you're determined :)

Hey, giving what I'm working on now in my real life, doing OpenMP codeis about the best thing I can do for Octave.

So, let's merge the elemental operations patch first. Even though they
scale poorly, it appears you can still get them, say, 2x faster with 4
cores. If you have 4 free cores, why not... besides, certain cases
scale well. For instance, integer division is apparently
computationally intensive enough to scale as well as the mappers.

We do not need to do the ugly magic proposed above; the logical
operations on reals are actually the only loops that can throw, and I
never liked this solution anyway. I think this can be solved by other
means, by just checking for the NaNs prior to the operation. We shall
indicate in mx-inlines.cc that loops must not throw exceptions, and
declare them as such.

Yes but a "Ctrl-C" and out-of-memory can cause an exception anywhere, soI'd expect we need to be careful in any case. If we declare certainfunctions as not throwing exceptions the would prevent the use of"Ctrl-C" in these functions. I suppose in most case the memory isallocated before the loop and the "out-of-memory" exception can probablybe avoided. So the only effect of you solution is disabling ctrl-C inOpenMP loops in the elementary functions.

Then we can address the mappers. This is going to be more difficult as
I don't think we can currently universally rely on the mappers to
never throw. Further, unlike the fast loops, the mappers can take a
long time and so are supposed to be breakable.

If we have a solution for the exceptions from mapper functions, if itsnot too harsh on the performance why not use it for the elementaryfunctions as well? If its costly sure its better to avoid it.

I think this is one of the biggest drawbacks of the OpenMP attempts.
Previously, making computations breakable required just putting
octave_quit() somewhere. Now, it should be done carefully and inside
parallel sections we simply need to resort to something more
complicated.
Maybe there's a better, exception-safe method of C++ parallelism out
there? What about boost::threads? (I don't know, I'd have to check it
out).

Reading

http://bisqwit.iki.fi/story/howto/openmp/

it seems that we can't use "#pragma omp parallel for" if we want theexception to result in all the threads exiting immediately. The effectmight be than the exception is treated in the loop that threw it, butthe other maps have to finish before going any further. Though writingthe parallel for loop manually in OpenMP is possible and we could checkif an n exception had been thrown and exit all of the threads early.This would change the structure of your code however and probably makeit slower... As these threads likely to exist longer enough to make theoverhead worthwhile?

Otherwise we need to do something else than OpenMP, like what isdescribed here.


http://www.developertutorials.com/tutorials/miscellaneous/c-concurrency-in-action-9-12-17/page1.html

I'd suggest going with the simple solution using "#pragma omp parallelfor" and try-catch first and then see if it causes any issues

D.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Using OpenMP in Octave, David Bateman <=
- Re: Using OpenMP in Octave, Jaroslav Hajek, 2010/04/03

Prev by Date: Re: Bug Tracking
Next by Date: gnulib woes & run-octave fails, suggestions?
Previous by thread: Re: Bug Tracking
Next by thread: Re: Using OpenMP in Octave
Index(es):
- Date
- Thread