octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fftfilt patch


From: Daniel J Sebald
Subject: Re: fftfilt patch
Date: Fri, 05 Apr 2013 17:46:13 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16

On 04/05/2013 12:50 PM, Daniel J Sebald wrote:

There is something else about this routine I don't like, which I will
post later after running some simple tests.

After trying this fftfilt() routine a bit, what I don't like about it is if the user isn't completely familiar with the routine the results could be misleading to the point where the conclusion is that the routine is very bad. It has to do with choosing an overlap-and-add FFT length that is too short compared to the length of filter impulse response b. If the user does such a thing, the algorithm becomes absurdly slow. (See the chart in Figure 2 of http://en.wikipedia.org/wiki/Overlap%E2%80%93add_method... point being there are a number of variables here and unless one really knows what is going on, he's likely to be confused.)

This fact is alluded to in the comment near the top of fftfilt().

"
  ## If N is not specified explicitly, we do not use the overlap-add
  ## method at all because loops are really slow.  Otherwise, we only
  ## ensure that the number of points in the FFT is the smallest power
  ## of two larger than N and length(b).  This could result in length
  ## one blocks, but if the user knows better ...
"

Only ensuring that the number of points is the smallest leads to a block length of 1, yes. And if it does (an easy mistake, user has a length 128 filter, so they choose N=128), the algorithm advances one sample per iteration and basically comes to a halt. First, I don't like when algorithms change the settings on the user:

    n = 2 ^ nextpow2 (max ([n, l_b]));

especially without notification of doing so. Second, it would be nice if the algorithm would give some kind of hint to the user that what he or she is doing is wrong.

I've put a patch here

https://savannah.gnu.org/patch/index.php?7999

with the mods I'd like. The changes don't modify the FFT length. Instead the mods give an error if N is too short and a warning if N is so short it is extremely slow. The user is allowed to use a non power of two algorithm, say N=817, if he or she would like. Doing so wouldn't be as efficient as a power of two FFT, but the loss from that choice isn't that great and I've noted in the help that a power of two FFT is more efficient so there is no warning for that.

There is also a short script file at the patch report to compare techniques. The first input is the length of b. The second input is the length N of the FFT. Try things like the following (I'll write a remark after each):

octave:10> fftfiltbenchmark (64, 64)
filter cpu (s): 0.111983
warning: fftfilt: N is small relative to length of filter, may run slowly
fftfilt cpu (s): 117.42
maximum discrepancy: 1.06581e-14
[Wow, that was slow. The block length L is one meaning that there is only a single element of the computation free of the wrap-around effect of circular convolution. The FFT,FFT,mult,IFFT has to be done for every sample, and we know that the looping is somewhat slow.]

octave:11> fftfiltbenchmark (64, 96)
filter cpu (s): 0.111982
warning: fftfilt: N is small relative to length of filter, may run slowly
fftfilt cpu (s): 3.72343
maximum discrepancy: 2.4869e-14
[Well, that's much better than the previous, but still much worse than just doing normal convolution.]

octave:12> fftfiltbenchmark (64, 128)
filter cpu (s): 0.110984
fftfilt cpu (s): 2.04869
maximum discrepancy: 3.90799e-14
[Better still, not much compared to previous choice of N. No more warning message about potential slowness. At least it isn't dreaded slow.]

octave:13> fftfiltbenchmark (64, 256)
filter cpu (s): 0.110983
fftfilt cpu (s): 0.888865
maximum discrepancy: 3.90799e-14
[Again getting better, but at this trend it doesn't seem like the fftfilt algorithm is worth much.]

octave:14> fftfiltbenchmark (64, 512)
filter cpu (s): 0.110983
fftfilt cpu (s): 0.535919
maximum discrepancy: 4.61853e-14
[Same trend.]

octave:15> fftfiltbenchmark (64, 1024)
filter cpu (s): 0.110983
fftfilt cpu (s): 0.396939
maximum discrepancy: 5.32907e-14
[Not doing much.]

octave:16> fftfiltbenchmark (96, 1024)
filter cpu (s): 0.167975
fftfilt cpu (s): 0.406937
maximum discrepancy: 6.75016e-14
[OK, lets increase the filter length then. Now too much increase in the fftfilt cpu consumption. The reason is that 1024-96 isn't too much different from 1024-64. But the filter convolution is trending upward a bit.]

octave:17> fftfiltbenchmark (128, 1024)
filter cpu (s): 0.217967
fftfilt cpu (s): 0.45893
maximum discrepancy: 9.23706e-14
[Trending upward on the filter cpu consumption still.]

octave:18> fftfiltbenchmark (256, 1024)
filter cpu (s): 0.419936
fftfilt cpu (s): 0.468929
maximum discrepancy: 2.27374e-13
[Same trend.]

octave:19> fftfiltbenchmark (512, 1024)
filter cpu (s): 3.41548
fftfilt cpu (s): 0.651901
maximum discrepancy: 4.83169e-13
[OK, now normal convolution is starting to lose out in a big way.]

octave:21> fftfiltbenchmark (512, 1023)
filter cpu (s): 3.41648
fftfilt cpu (s): 0.809876
maximum discrepancy: 3.97904e-13
[Here there is a change to a non power of two FFT. You can see there is maybe a 20% hit. I don't think that is worth giving a warning about, so I just made mention of it in the documentation.]

octave:22> fftfiltbenchmark (512, 1025)
filter cpu (s): 3.41848
fftfilt cpu (s): 0.823875
maximum discrepancy: 3.69482e-13
[Again, non power of two but on the other side of 1024.]


I changed the internal fftfiltbenchmark routine to use "ones" instead of "rand".

octave:23> fftfiltbenchmark (512, 1024)
filter cpu (s): 3.42048
fftfilt cpu (s): 0.697894
maximum discrepancy: 0
[You can see that there is a cost of about 8% for that real/imaginary and rounding cleanup at the end, but it does make the discrepancy 0.]

Without going into great analysis of the algorithm, my conclusion is that the for-loop does factor in a bit (for example, fftfiltbenchmark (64, 128) result could probably be a little better), but if N is chosen right it is tolerable. The rounding and real/imaginary clean up isn't too costly, but I'd still prefer an option to remove that (not part of the patch).

Dan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]