Hi,
For a while now (and very slowly BTW) I've been trying to optimize gr-isdbt's performance. In particular, and naturally, the most serious bottleneck is the Viterbi decoder. So far, we've used gr-dvbt's implementation (now part of gr-dtv) which runs online on a i7, but consumes 100% of a processor on an i5, and so after a while we get samples dropped from the hardware.
It is my understanding that this particular implementation is based on code by Phil Karn back from 1995. I've looked around and found that Phil Karn has updated its original code until 2007, and in particular its Viterbi decoder until 2004. It may be found in
http://www.ka9q.net/code/fec/.
I was wondering if it was worthwhile (in terms of speed) to try to adapt the newer code to the viterbi decoder. In particular, I found the following line in viterbi27_sse2.c:
/* This code is turned off because it's slower than my hand-crafted assembler in sse2bfly27.s. But it does work. */
and then a function very similar to d_viterbi_butterfly_sse2 follows.