discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss-gnuradio] OMP Data. GNURadio overhead?


From: Dennis Glatting
Subject: [Discuss-gnuradio] OMP Data. GNURadio overhead?
Date: Fri, 14 Aug 2015 11:03:24 -0700

Sorry for the HTML...

I have been done some work applying OpenMP to GNURadio and collected some data. This data was collected WITHOUT GNURadio overhead. Specifically, I interfaced directly with my detector passing 30 seconds (30 seconds * 10msps) of data in a buffer (i.e., I allocated and filled gr_vector_const_void_star, etc.) and calculated the performance at different sensitivities. Herein lies one problem.

When applying OpenMP against a buffer there has to be enough data to make it worth while but GNURadio buffers are fairly small. I don't see a reasonable way to increase buffer sizes for a single source->block without modifying the constant in flat_flowgraph.cc which has the side effect of the default size for all buffers. Yes?

I am looking for a way to measure GNURadio overhead. There is a certain amount of overhead depending on the number of blocks, set() functions, GUI Sinks, etc. and I'd like to know what that overhead is. Ideas? 

One thought is to set a hardware pin low in the source block and set it high in the detector block then measuring with a scope. The problem is these pins often incur kernel overhead by opening something in /dev, writing a string, then closing the device and waiting for the kernel to get around to actually toggling the pin. Measurements showed this is wildly unpredictable. Another option is to toggle a ping on an SDR but the same problem exists with additional USB transaction delays.

Anyway, in the data below a "signal" buffer is defined as ~1200 samples (i.e., MAXimum message size), or 2x1200=2400 complex number "chunks". I found 2xMAX a reasonable value because it is within a reasonable buffer amount from GNURadio with my alteration to flat_flowgraph.cc. The OpenMP code really looks like this:

#pragma omp parallel for num_threads(ncpu),schedule(dynamic,1)  \
  if( work_list.size() > 1 )
      for( size_t i = 0; i <  work_list.size(); ++i ) {
do work...
      }

Pretty simple.

That said, unless GNURadio can provide a selective and reasonably large amount of samples to process then the value of applying OpenMP is probably moot.


Below, the term "sensitivity" is a bit of a misnomer because as sensitivity increases signal rejection increases; but the text is what it is. More specifically, there are roughly twelves sets of criteria that need to be met before signal presence is declared. Some of those criteria involve std::log10() and std::pow(10.0,x) operations but interestingly those math operations are a very small amount of the detection effort (1.02% worst case).

The numbers in the first block below is the rate in samples/second I can process samples. For example, "Baseline" "45.0" is "1,662,796" samples per second.

From an OpenMP perspective, I have eight cores but limited the effort to five with the idea GNURadio overhead and other blocks have three cores to do their thing, worst case. OpenMP gave me a >300% performance gain in "OMP 5core,2xMAX) but the theoretical gain is 400%. Not perfect but I'll take it. What these numbers tell me is OpenMP can have significant value in the context of GNURadio.

My code was run on an AMD 9590 at 4.7GHz, 5GHz boost -- my development platform. For reasons, I also ran it on a CubieBoard4 (ARM architecture). I should also mention I have seen NO side effects of running OpenMP within GNURadio other than considerable amusement.



Sensativity:
45.050.055.060.065.0
Baseline (no OMP)1,662,769.543,478,927.5012,272,150.9919,503,025.5320,680,179.97
Load divided by 5cores6,689,934.9013,861,829.9052,729,762.3284,351,801.4289,637,372.72
OMP 1core1,445,098.782,146,371.2212,476,405.0919,413,352.3920,517,069.07
OMP 5core, 2xMAX6,966,915.1014,678,631.5955,022,397.0287,443,925.0792,651,283.37
OMP 5core, 4xMAX6,992,352.2414,907,933.7655,214,642.0287,609,463.6592,778,048.16
OMP 5core, 8xMAX6,956,344.7614,660,481.2455,202,186.9387,681,575.7392,898,798.06
CubieBoard 5core, 2xMAX1,805,831.743,794,176.1614,182,461.4224,444,325.8526,173,792.25
Performance difference
From Baseline
Baseline (no OMP)0.00%0.00%0.00%0.00%0.00%
Load divided by 5cores302.34%298.45%329.67%332.51%333.45%
OMP 1core-13.09%-38.30%1.66%-0.46%-0.79%
OMP 5core, 2xMAX318.99%321.93%348.35%348.36%348.02%
OMP 5core, 4xMAX320.52%328.52%349.92%349.21%348.63%
OMP 5core, 8xMAX318.36%321.41%349.82%349.58%349.22%
CubieBoard 5core, 2xMAX8.60%9.06%15.57%25.34%26.56%



reply via email to

[Prev in Thread] Current Thread [Next in Thread]