A quick run @ 2 samp/sym gave: Old recv... xcorr offset: 8 num bits compared: 499987 num bit errors: 1275 BER: 0.00255006630172
Old recv plus... xcorr offset: 16 num bits compared: 499979 num bit errors: 1270 BER: 0.00254010668448
One bit dd recv... xcorr offset: 17 num bits compared: 499979 num bit errors: 547
BER: 0.00109404594993
Two bit dd recv... xcorr offset: 17 num bits compared: 499979 num bit errors: 50 BER: 0.000100004200176
So we see that for lower samp/sym, the effect of adding the pre-clock recovery filter is not as great (not surprising, since the signal takes up a larger percentage of the BW). You still get good gains from the changes made to the differential detector, however. The lower the xmit BT, the more improvement you get from switching to a 2 bit diff detect, as it opens the eye diagram by a comparatively larger amount.
I have yet to tweak filter design or threshold bias amount, so performance can potentially get a little better.
I will tidy up the code and try to submit to a branch ASAP.
Is it a no-no to submit any code that depends on 3rd party modules (scipy, pylab)? I would make the core demod not require these, of course, but some of my post-analysis / debug code uses these 2 modules.