octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using OpenMP in Octave


From: David Bateman
Subject: Re: Using OpenMP in Octave
Date: Fri, 02 Apr 2010 22:55:42 +0200
User-agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706)

Jaroslav Hajek wrote:
I thought you would :) What if I won't like your constants, figuring
out better working ones for me? A commit war?
Not a chance you've committed much more than me recently ... You clearly have more time for fun stuff like Octave than me :'-(

I need to be able to turn off the multithreading at runtime and I
absolutely insist on that, so there needs to be a branch anyway.
The solution I offered earlier was to encapsulate the decision into a
routine, that will accept loop dimensions (number of cycles and the
"size" of one cycle) and decide whether it's worth parallelizing.
An universal switch point may be a good start. But if you expect
people to play with the parallel code, I don't see why you don't want
a tunable parameter for every operation and every mapper, if it can be
done efficiently.
Mainly that I think it might be inefficient to have a set of tunable parameters. Frankly the more I think about probably the best solution is ti just say that the cases where the multithread code is slower than the single thread code are going to be the cases where their speed isn't an issue, and have a really stupid minimalist test and a flag to turn off multithreading complete. Yes I think its a good idea to be able to turn it off, and not just with a flag when Octave is launched as matlab currently implements it

<cut proposed exception handling OpenMP code
Yes. Besides, I don't think this works.
I think you mixed together the interrupts thrown from Fortran that are
handled by setjmp/longjmp and regular C++ exceptions (which are the
case here). Here, one needs to employ try/catch. But see below, I
don't think this is necessary.
Ok, I didn't test the proposed code but threw it out for discussion.. There's also a few typos and uninitialized values as well. I'd thought the setjmp/longjmp stuff might be needed for mapper functions wriitten in fortran, but in fact I don't think so as the setjmp/longjmp exceptions will be rethrown as standard C++ exceptions within each OpenMP thread.. I'll work up something else instead.


This question of the exception will occur even more as
we try to parallelize at a coarser grain and you probably have similar
issues in your parcellfun implementation already.


Not really. Remember that parcellfun does not use multithreading.
But what happens when one of your forked processed unexpectedly dies?

Are you sure that MKL and ACML both use OpenMP for multithreading? What
about others vendors? In any case wrapping all of this in a function to set
the number of threads in Octave should be easier. Note that Matlab
deprecated the maxNumCompThreads function

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/maxnumcompthreads.html
http://www.mathworks.com/access/helpdesk/help/techdoc/rn/bry1ecg-1.html#br1ask3-1

in recent versions of matlab exactly because libraries like BLAS and FFTW
are multithreaded. So we'd be going against the decision taken by matlab
because they consider they can't control the number of threads of these
libraries.


Unlike Matlab, Octave is open, so users can use BLAS of their choice.

LD_PRELOAD=/usr/lib/myfavblas.so matlab

works fine or at least used to when I was last able to test it. You might have to LD_PRELOAD the lapack as well if there are some unexpected dependencies between the lapack and blas chosen by matlab. I used to do things like this all the time when benchmarking octave and matlab to ensure a fair comparison that was independent of the external dependencies.

No, they're not always multithreaded by OpenMP. For instance, GotoBLAS
is not. So what? It may be useful anyway, but we can simply say "this
function is equivalent to omp_set_num_threads, so it will only
influence libraries based on OpenMP".
Frankly I have no problems one way or the other about this, I was just pointing out that matlab deprecated this functionality and so we'd be making the opposite chose.. In general I like to keep Octave and matlab compatible, but in this case as you say "so what".. Its inclusion in Octave won't affect the compatibility with matlab as its probably not a function that I'd think to use within a script or function and even if I did wrapping in an "if exist(OCTAVE_VERSION)" isn't a big issue

I think stuff like this has a far bigger impact than parallelizing
plus and minus. Besides, Octave's got no JIT, so even if you
parallelize plus, a+b+c will still be slower than it could be if it
was done by a single serial loop.
Not that it really matters; as I already said these operations are
seldom a bottleneck, and if they are, rewriting to C++ is probably the
way to go.

Yes but as you said yourself its a manner of communications. If someone does
a bench of a+b, stupid as that maybe be, I'd like Octave to be just as fast
as matlab. Kai's results showed that is quad core AMD improved performance
of the binary operators with multithreading. JITs another issue that yes the
cases where it makes the most difference aren't the ones that have the
largest impact on well written code. Though there are a lot of idoits out
there writing bad code and benchmarking Octave against Matlab and then
badmouthing Octave. So yes JIT would be good to have as well :-)


Maybe you place too much on idiots. Chasing Matlab in functionality is
useless; we can never win. There will always be idiots like that. Who
cares?
No its a case of evangelizing Octave.. Do a search for "octave" on the matlab newsgroup and see how many users there describe Octave as slow... I've even worked with people in the past that wouldn't use Octave because of their code was twice as slow in Octave, even though we only had 10 matlab licenses for the whole group and I was running multiple instances of Octave on many machines and getting results faster in any case. With an attitude of "who cares", I'm sorry but mathworks has already won and we only continue to develop Octave for ourselves..

That being said the person who has most contributed to the speed of Octave recently is you... I agree that getting the code that "matters" faster is more important than the gimmicks..

If its isolated and disabled by default I don't see why it shouldn't go into
3.4.0 to allow people to start playing with this code.. There seems to be
quite a few people regularly building and testing Octave now and letting
them have this functionality in an experimental form can only be a good
thinng, as long as of course it does put at risk the 3.4.0 release.


OK, I see you're determined :)
Hey, giving what I'm working on now in my real life, doing OpenMP code is about the best thing I can do for Octave.


So, let's merge the elemental operations patch first. Even though they
scale poorly, it appears you can still get them, say, 2x faster with 4
cores. If you have 4 free cores, why not... besides, certain cases
scale well. For instance, integer division is apparently
computationally intensive enough to scale as well as the mappers.

We do not need to do the ugly magic proposed above; the logical
operations on reals are actually the only loops that can throw, and I
never liked this solution anyway. I think this can be solved by other
means, by just checking for the NaNs prior to the operation. We shall
indicate in mx-inlines.cc that loops must not throw exceptions, and
declare them as such.
Yes but a "Ctrl-C" and out-of-memory can cause an exception anywhere, so I'd expect we need to be careful in any case. If we declare certain functions as not throwing exceptions the would prevent the use of "Ctrl-C" in these functions. I suppose in most case the memory is allocated before the loop and the "out-of-memory" exception can probably be avoided. So the only effect of you solution is disabling ctrl-C in OpenMP loops in the elementary functions.

Then we can address the mappers. This is going to be more difficult as
I don't think we can currently universally rely on the mappers to
never throw. Further, unlike the fast loops, the mappers can take a
long time and so are supposed to be breakable.
If we have a solution for the exceptions from mapper functions, if its not too harsh on the performance why not use it for the elementary functions as well? If its costly sure its better to avoid it.
I think this is one of the biggest drawbacks of the OpenMP attempts.
Previously, making computations breakable required just putting
octave_quit() somewhere. Now, it should be done carefully and inside
parallel sections we simply need to resort to something more
complicated.
Maybe there's a better, exception-safe method of C++ parallelism out
there? What about boost::threads? (I don't know, I'd have to check it
out).

Reading

http://bisqwit.iki.fi/story/howto/openmp/

it seems that we can't use "#pragma omp parallel for" if we want the exception to result in all the threads exiting immediately. The effect might be than the exception is treated in the loop that threw it, but the other maps have to finish before going any further. Though writing the parallel for loop manually in OpenMP is possible and we could check if an n exception had been thrown and exit all of the threads early. This would change the structure of your code however and probably make it slower... As these threads likely to exist longer enough to make the overhead worthwhile?

Otherwise we need to do something else than OpenMP, like what is described here.

http://www.developertutorials.com/tutorials/miscellaneous/c-concurrency-in-action-9-12-17/page1.html

I'd suggest going with the simple solution using "#pragma omp parallel for" and try-catch first and then see if it causes any issues

D.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]