Re: [Discuss-gnuradio] On the convolutional code performance of gr-ieee8

From:

Marcus Müller

Subject:

Re: [Discuss-gnuradio] On the convolutional code performance of gr-ieee802-11

Date:

Tue, 15 Sep 2015 10:44:34 +0200

User-agent:

Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

Hi Jeon,

speed depends on your hardware and the implementation of the decoder.
As a rule of thumb: the more "generally applicable" a decoder is, the slower it gets.
Jan wrote a set of highly SIMD-optimized decoders, but these are (pretty common) special cases, so they don't cover all the cases that gr-trellis works in, or even more general, the ITPP decoders application range. I'd assume you could get a significant speed boost if you replaced the IT++ implementation with your own, highly specialized decoder if you know what you're doing, but honestly, implementing, leave alone optimizing, decoders is a non-trivial task and one should definitely not start a project like gr-ieee802-11 trying to write one's own decoder if there's an existing decoder out there that's usable (IT++ can be a pain to use, still).

Generally, I'd frown upon using a VM to benchmark decoders: Good decoders might make substantial use advanced SIMD instructions, but these might not be enabled in your virtualizer. Furthermore, if you want to do real-world gr-ieee802-11 usage, *don't* work in a VM, unless you're super knowledgable about how to configure VMs; latency and CPU overhead is critical, so the default "NAT" network configuration will not work well for network-attached USRPs, and USB3 support in VMs is ranging between bad and horrible, so B2x0 aren't really the best thing to be used in VMs, either.

Run "volk-config-info --avail-machines" and check whether the output contains:
generic_orc;sse2_64_mmx_orc;sse3_64_orc;ssse3_64_orc;sse4_1_64_orc;sse4_2_64_orc;avx_64_mmx_orc;

If that's the case, your VMWare does allow AVX/SSE4 inside your VM.

Best regards,
Marcus

On 15.09.2015 09:47, Jeon wrote:

I've measured time taken by convolutional decoding in gr-ieee802-11. The module is using Punctured Convolutional Code class from IT++ library (http://itpp.sourceforge.net/4.3.0/classitpp_1_1Punctured__Convolutional__Code.html)

I've used chrono (chrono.h, chrono) to measure time taken. You can see how I made it from the following page (https://gist.github.com/gsongsong/7c4081f44e88a7f4407a#file-ofdm_decode_mac-cc-L252-L257)

I've measure time with a loopback flow graph (w/o USRP; examples/wifi_loopback.grc)

The result says that it takes from 5,000 to 30,000 us, which is 5 to 30 ms to decode a signal with a length of 9,000 samples (samples are either 1 or -1.)

* Test environment: Ubuntu 14.04 on VMWare, 2 CPUs and 4 GB RAM allocated

* Host environmetn: Windows 7 with i7-3770 3.7 GHz

Since I am not familiar with error correcting codes, I have no idea how large the order of time taken is. But I think that one of the most efficient decoding algorithm is Viterbi and that IT++ must use it.'

Then I can deduce that CC decoding takes a quite long time even though the algorithm (Viterbi) is very efficient. And is it a natural limitation of software decoding and SDR?

Another question comes that, the commercial off the shelf (COTS) Wi-Fi device achieves really high throughput and that must be based on super faster CC decoding. Is that because COTS is using heaviliy optimized FPGA and dedicated decoding chips?

Regards,

Jeon.
_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio