octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More efficient MEX or MEX-like interface


From: Jaroslav Hajek
Subject: Re: More efficient MEX or MEX-like interface
Date: Tue, 30 Sep 2008 19:54:09 +0200

On Tue, Sep 30, 2008 at 4:39 PM, Fredrik Lingvall <address@hidden> wrote:
> David Bateman wrote:
>>
>>>
>>
>> This is exactly my point... If matlab had to reform a C99 complex matrix
>> and then call zGEMM rather than use four calls to dGEMM and do the additions
>> then it would be slower as the creation and copying of the data to a C99
>> complex array comes at a cost. In fact xGEMM does the operator
>>
>>  C = alpha * A * B + beta * C
>>
>> and so you might for example call dGEMM once passing the two imaginary
>> parts as A, B and have beta of zero, then have a second call with that
>> result as C, beta of -1 and A and B being the imaginary parts, thus giving
>> you the real part of the complex matrix multiply with two calls to dGEMM on
>> the real and imaginary parts of the matrix. The same goes for the
>> calculation of the imaginary part of the matrix multiply. The underlying
>> code of zGEMM has to do something similar in any case, it just does the four
>> multiplications element by element instead, so there is no surprise it is
>> much the same speed as what matlab does.
>>
>> The absence of cGEMM from the symbols of the numeric is a pretty good
>> indication that the above is exactly what mathworks does as I see no reason
>> to handle matrix multiplies different between double and single precision
>> values.
>>
> OK. I just guessed that it would be more efficient  to call zGEMM once
> (where the real and imag parts are closer in memory) than calling dGEMM four
> times (which  should generate more cache misses).  This seems not to be the
> case though (at least not for the size of the matrices that I used).
>

The number of FLOPs is the same, so it would only be true if ZGEMM had
significantly better peak ratio, which it normally doesn't.

The Matlab layout is only problematic when you can't split the
operation at the highest level, i.e. for factorizations and
factorization updates. For instance, I guess that Matlab's complex
qrupdate is, for moderate matrix sizes, significantly slower than
Octave's.

I guess the main reason why Mathowrks chose this was the absence of
complex type from C in the past.


> /F
>
>



-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]