Re: conv2 performance

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: conv2 performance

From:	Robert T. Short
Subject:	Re: conv2 performance
Date:	Mon, 01 Mar 2010 10:28:42 -0800
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4) Gecko/20091017 SeaMonkey/2.0

John Swensen wrote:

On Mar 1, 2010, at 10:22 AM, Robert T. Short wrote:

John Swensen wrote:

On Feb 28, 2010, at 10:53 AM, John W. Eaton wrote:

Maybe there is still room for improvement here.  I would happily use a
free software library with a GPL-compatible license to implement this
function, but I don't know whether one is available.

jwe

I have recently been doing a lot of 2D convolutions.  I think the fastest 
method should not involve loops of any kind.  As convolution in the time domain 
(or spatial domain when considering images) is multiplication in the frequency 
domain, the fastest method is to take the FFT of both image and kernel, 
dot-multiply them, then take the inverse FFT.  Since the FFT method is usually 
provided by FFTW, this should be optimized and quite fast.  Of course, there 
has to be some padding that takes place to make sure both 'images' are the same 
size.  I was using Matlab for this computation and the speed improvement of the 
FFT method over the Matlab-provided conv2 was considerable (100 seconds versus 
2 second; I was convolving a 2048x2048 image with a 256x256 kernel).

I think the method is formally called the overlap-add method 
(http://en.wikipedia.org/wiki/Overlap-add_method).  I used a script from 
MatlabCentral (no flaming please, as I already saw the discussion that has been 
going on for a week or two).  This is the method used for many GPGPU 
implementations.  There is an in-depth description of the best way to do the 
padding in an NVidia white paper that can be found at
http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/convolutionFFT2D/doc/convolutionFFT2D.pdf

John

Certainly that is a faster way, especially for large convolutions, but I don't 
think conv or conv2 should do it this way.  The straight approach has important 
uses as well.  Overlap/add and overlap/save can also be used when the 
convolution is over multiple blocks and that would be a useful library function 
as well, but again I think that should be separate from conv and conv2.


Bob

I don't see why it should necessarily be separate, but simply a conditional 
usage based on the size of the convolutions.  Shouldn't should try to implement 
something that is fast for both small and large convolutions, without the user 
having to download an extra package?  One is already implemented and the other 
would take a little bit of work.

John

Sorry I sent that just to John instead of the list.

Personally, I am not in favor of complicated argument lists to decidethe algorithm a function should use. I know MATLAB uses this a lot, butI think it obscures the basic simplicity of the function. Look backthrough the history of conv and conv2 - for such simple functions, youwould think that stability would have been achieved long ago, but just afew months ago I submitted a change to conv. Adding more stuff insidewill make it worse.

I feel that conv and conv2 should be MATLAB compatible both in form andfunction - don't add other stuff. Create separate functions fordft-based convolutions (fftconv?). It would be worth adding overlap-addand overlap-save functions as well (I might even have some 1d functionsaround). I don't know whether this stuff should go in the core either.

I agree with Michael about the MATLAB engineer's analysis being a bitshallow, but I have seen similar analyses. I don't know the real answerthough.

BTW, the padding is not just to make the sequences the same size, but(normally) to maintain linear rather than circular convolution. Forlarge images and long impulse responses, this can get pretty yucky.

Bob

[Prev in Thread]

Current Thread

[Next in Thread]

Re: conv2 performance, (continued)
- Re: conv2 performance, Søren Hauberg, 2010/03/01
- Re: conv2 performance, John Swensen, 2010/03/01
  - Re: conv2 performance, Søren Hauberg, 2010/03/01
    - Re: conv2 performance, Michael D. Godfrey, 2010/03/01
  - Message not available
    - Message not available
    - Re: conv2 performance, Robert T. Short <=
    - Re: conv2 performance, Søren Hauberg, 2010/03/01
    - Re: conv2 performance, Michael D. Godfrey, 2010/03/02
    - Re: conv2 performance, Michael D. Godfrey, 2010/03/01
- conv2 performance, Lukas Reichlin, 2010/03/03
  - conv2 performance, John W. Eaton, 2010/03/03
    - Re: conv2 performance, Lukas Reichlin, 2010/03/04

Prev by Date: Re: conv2 performance
Next by Date: Re: C++ complex class use in str2double.cc
Previous by thread: Re: conv2 performance
Next by thread: Re: conv2 performance
Index(es):
- Date
- Thread