[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: fftfilt patch
From: |
Daniel J Sebald |
Subject: |
Re: fftfilt patch |
Date: |
Fri, 05 Apr 2013 17:46:13 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16 |
On 04/05/2013 12:50 PM, Daniel J Sebald wrote:
There is something else about this routine I don't like, which I will
post later after running some simple tests.
After trying this fftfilt() routine a bit, what I don't like about it is
if the user isn't completely familiar with the routine the results could
be misleading to the point where the conclusion is that the routine is
very bad. It has to do with choosing an overlap-and-add FFT length that
is too short compared to the length of filter impulse response b. If
the user does such a thing, the algorithm becomes absurdly slow. (See
the chart in Figure 2 of
http://en.wikipedia.org/wiki/Overlap%E2%80%93add_method... point being
there are a number of variables here and unless one really knows what is
going on, he's likely to be confused.)
This fact is alluded to in the comment near the top of fftfilt().
"
## If N is not specified explicitly, we do not use the overlap-add
## method at all because loops are really slow. Otherwise, we only
## ensure that the number of points in the FFT is the smallest power
## of two larger than N and length(b). This could result in length
## one blocks, but if the user knows better ...
"
Only ensuring that the number of points is the smallest leads to a block
length of 1, yes. And if it does (an easy mistake, user has a length
128 filter, so they choose N=128), the algorithm advances one sample per
iteration and basically comes to a halt. First, I don't like when
algorithms change the settings on the user:
n = 2 ^ nextpow2 (max ([n, l_b]));
especially without notification of doing so. Second, it would be nice
if the algorithm would give some kind of hint to the user that what he
or she is doing is wrong.
I've put a patch here
https://savannah.gnu.org/patch/index.php?7999
with the mods I'd like. The changes don't modify the FFT length.
Instead the mods give an error if N is too short and a warning if N is
so short it is extremely slow. The user is allowed to use a non power
of two algorithm, say N=817, if he or she would like. Doing so wouldn't
be as efficient as a power of two FFT, but the loss from that choice
isn't that great and I've noted in the help that a power of two FFT is
more efficient so there is no warning for that.
There is also a short script file at the patch report to compare
techniques. The first input is the length of b. The second input is
the length N of the FFT. Try things like the following (I'll write a
remark after each):
octave:10> fftfiltbenchmark (64, 64)
filter cpu (s): 0.111983
warning: fftfilt: N is small relative to length of filter, may run slowly
fftfilt cpu (s): 117.42
maximum discrepancy: 1.06581e-14
[Wow, that was slow. The block length L is one meaning that there is
only a single element of the computation free of the wrap-around effect
of circular convolution. The FFT,FFT,mult,IFFT has to be done for every
sample, and we know that the looping is somewhat slow.]
octave:11> fftfiltbenchmark (64, 96)
filter cpu (s): 0.111982
warning: fftfilt: N is small relative to length of filter, may run slowly
fftfilt cpu (s): 3.72343
maximum discrepancy: 2.4869e-14
[Well, that's much better than the previous, but still much worse than
just doing normal convolution.]
octave:12> fftfiltbenchmark (64, 128)
filter cpu (s): 0.110984
fftfilt cpu (s): 2.04869
maximum discrepancy: 3.90799e-14
[Better still, not much compared to previous choice of N. No more
warning message about potential slowness. At least it isn't dreaded slow.]
octave:13> fftfiltbenchmark (64, 256)
filter cpu (s): 0.110983
fftfilt cpu (s): 0.888865
maximum discrepancy: 3.90799e-14
[Again getting better, but at this trend it doesn't seem like the
fftfilt algorithm is worth much.]
octave:14> fftfiltbenchmark (64, 512)
filter cpu (s): 0.110983
fftfilt cpu (s): 0.535919
maximum discrepancy: 4.61853e-14
[Same trend.]
octave:15> fftfiltbenchmark (64, 1024)
filter cpu (s): 0.110983
fftfilt cpu (s): 0.396939
maximum discrepancy: 5.32907e-14
[Not doing much.]
octave:16> fftfiltbenchmark (96, 1024)
filter cpu (s): 0.167975
fftfilt cpu (s): 0.406937
maximum discrepancy: 6.75016e-14
[OK, lets increase the filter length then. Now too much increase in the
fftfilt cpu consumption. The reason is that 1024-96 isn't too much
different from 1024-64. But the filter convolution is trending upward a
bit.]
octave:17> fftfiltbenchmark (128, 1024)
filter cpu (s): 0.217967
fftfilt cpu (s): 0.45893
maximum discrepancy: 9.23706e-14
[Trending upward on the filter cpu consumption still.]
octave:18> fftfiltbenchmark (256, 1024)
filter cpu (s): 0.419936
fftfilt cpu (s): 0.468929
maximum discrepancy: 2.27374e-13
[Same trend.]
octave:19> fftfiltbenchmark (512, 1024)
filter cpu (s): 3.41548
fftfilt cpu (s): 0.651901
maximum discrepancy: 4.83169e-13
[OK, now normal convolution is starting to lose out in a big way.]
octave:21> fftfiltbenchmark (512, 1023)
filter cpu (s): 3.41648
fftfilt cpu (s): 0.809876
maximum discrepancy: 3.97904e-13
[Here there is a change to a non power of two FFT. You can see there is
maybe a 20% hit. I don't think that is worth giving a warning about, so
I just made mention of it in the documentation.]
octave:22> fftfiltbenchmark (512, 1025)
filter cpu (s): 3.41848
fftfilt cpu (s): 0.823875
maximum discrepancy: 3.69482e-13
[Again, non power of two but on the other side of 1024.]
I changed the internal fftfiltbenchmark routine to use "ones" instead of
"rand".
octave:23> fftfiltbenchmark (512, 1024)
filter cpu (s): 3.42048
fftfilt cpu (s): 0.697894
maximum discrepancy: 0
[You can see that there is a cost of about 8% for that real/imaginary
and rounding cleanup at the end, but it does make the discrepancy 0.]
Without going into great analysis of the algorithm, my conclusion is
that the for-loop does factor in a bit (for example, fftfiltbenchmark
(64, 128) result could probably be a little better), but if N is chosen
right it is tolerable. The rounding and real/imaginary clean up isn't
too costly, but I'd still prefer an option to remove that (not part of
the patch).
Dan