octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using OpenMP in Octave


From: Michael D Godfrey
Subject: Re: Using OpenMP in Octave
Date: Mon, 29 Mar 2010 09:59:01 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Thunderbird/3.0.3

On 03/29/2010 02:50 AM, Jaroslav Hajek wrote:
Unfortunately, it confirms what I anticipated: the elementary
operations scale poorly. Memory bandwidth is probably the real limit
here. The mappers involve more work per cycle and hence scale much
better.

This is why I think we should not hurry with multithreading the
elementary operations, and reductions like sum(). I know Matlab does
it, but I think it's just fancy stuff, to convince customers that new
versions add significant value.
Elementary operations are seldom a bottleneck; add Amdahl's law to
their poor scaling and the result is going to be very little music for
lots of money.

When I read about Matlab getting parallelized stuff like sum(), I was
a little surprised. 50 million numbers get summed in 0.07 seconds on
my computer; generating them in some non-trivial way typically takes
at least 50 times that long, often much more. In that case,
multithreaded sum is absolutely marginal, even if it scaled perfectly.

One area where multithreading really helps is the complicated mappers,
as shown by the second part of the benchmark.
Still, I think we should carefully consider how to best provide parallelism.
For instance, I would be happy with explicit parallelism, something
like pararrayfun from the OctaveForge package, so that I could write:

pararrayfun (3, @erf, x, "ChunksPerProc", 100); # parallelize on 3
threads, splitting the array to 300 chunks.

Note that if I was about to parallelize a larger section of code that
uses erf, I could do

erf = @(x) pararrayfun (3, @erf, x, "ChunksPerProc", 100); # use
parallel erf for the rest of the code

If we really insisted that the builtin functions must support
parallelism, I say it must fulfill at least the following:

1. an easy way of temporarily disabling it must exist (for high-level
parallel constructs like parcellfun, it should be done automatically)
2. the tuning constants should be customizable.

for instance, I can imagine something like

mt_size_limit ("sin", 1000); # parallelize sin for arrays with > 1000 elements
mt_size_limit ("erfinv", 500); # parallelize erfinv for arrays with >
500 elements

We have no chance to determine the best constant for all machines, so
I think users should be allowed to find out their own.

-- RNDr. Jaroslav Hajek, PhD computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz
  
Very good summary!

While thinking about this it would be good to include an interface
to openCL or even (license permitting) CUDA.  Someone brought
this up recently in another thread.

Michael


reply via email to

[Prev in Thread] Current Thread [Next in Thread]