Re: [Discuss-gnuradio] questions about the GNURadio Scheduler

From:

Marcus Müller

Subject:

Date:

Sun, 7 Feb 2016 22:35:31 +0100

User-agent:

Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

Hello Gonzalo,

On 07.02.2016 20:58, Gonzalo Arcos wrote:

Hello,

I am trying to optimize the throughput of a flowgraph that was given to me, already designed and working. I have profiled every block, and improved on the performance of some blocks, which resulted in a better performance of the flowgraph as a whole.

However, at the moment, i am trying to tackle on how the graph is executed by the gnuradio scheduler, to see if i can parallelize anything (i.e. pipeline) that is currently being executed sequentially with no good reason.

To do this, i am trying to understand how does the gnuradio scheduler work, how blocks are executed, etc. I have not found as much information as i would like, leaving me with lots of questions. So from this point on i will state some of the information i gathered, and ask some questions. If anything i say is incorrect please tell me.

GNURadio defines one thread per block. This means that GNURadio automatically takes full advantage of multi core processors, without the programmer of the blocks having to do anything, given a high number of blocks.

- However, how does gnuradio scheduler decide which block to execute given a set of "ready to execute" blocks, without dependencies between them?

Basically, each block_executor has its own loop; roughly it's about this:

* Wait for additional input to come in or output buffer to be consumed and hence, ready for overwriting, or for new messages to come in
* If messages came in, handle them!
* Ask the block (via forecast) whether it can run (general_)work()
* run that!
* Notify upstream blocks of how many items you've consumed, freeing space in their output buffers,
* Notify downstream blocks of how many items you've produced, so they can start to work on that.
* begin from the top.

- If i have a flowgraph A -> B on a dual core processor, A and B being blocks. Will A and B execute concurrently after the first iteration of A?

Yes.

By this i mean, on the first iteration of A, B has no data to work on, so one core will be idle, however after that, both cores should be working, since B can process data sent by A, and A can process new data independent of what B is doing.

Exactly! Indeed, GNU Radio asks A to produce up to [size of A's output buffer in items]/2, so that as soon as its finished, B can start working, but A can go back to work right away, maximizing parallelism.

If A is faster than B at processing data, does A data gets queued on a buffer, and then is sent to B? Does B only triggers when the data requirements to perform a work (i.e. input items) are reached?.

If A is faster than B, A's input buffer will most of the time be empty, while A's output == B's input buffer will be full; as long as A has no space to write items to, it won't get asked to work().

- Is there any way to see which gnuradio thread is executing in each core of the cpu, and which block corresponds to that thread? (This would be REALLY USEFUL for debugging purposes)

Yes! Current versions of GNU Radio (I think since 3.7.2 or so) have proper thread names, so running `htop` or a similar Unix program will show which thread is running, consuming CPU etc.
For more in-depth analysis, I'd recommend having a look at `perf record` / `perf report` [1], or even more advanced, GNU Radio's built-in performance counters and performance monitor; activate them as explained in [2], and add a "performance monitor" to your GRC flowgraph, if you use GRC, or run `gr-perf-monitorx`.

- In the gnuradio wiki it is explained how to set thread affinity and priority. However, it is not clear what they are useful for. Thread priority is pretty straightforward, so the only concept i dont fully get is the block thread affinity. In which scenario could it be useful to set that a thread has to run on a specific core?

The point is that you might, for example, be using a block that uses a certain hardware accelerator, which is "close" to one CPU, but not to another. For most PC-style workstations, this won't happen, and it's best to let GNU Radio and your OS figure out on which CPU to schedule threads on their own.

I've personally yet to discover a case where this is useful.

Is there any point in trying to optimize a blocks performance by using OPEN MP or pthreads in cases of embarrasingly parallel operations? Or is it totally useless since all the cores are already at full load because there is one thread per block? I know that i could also try to use the GPU to speed up these kind of operations, but my first attempt was processor threads.

Sure!
Often, especially on HyperThreading machines, it makes a lot of sense to let one operation be fast really quick, because all the data it accesses needs to go to the CPU caches only once.
For example, even in relatively complex flow graphs with lots of blocks where all CPU cores are kept busy all the time, FFTs that are run with multiple threads tend to increase overall performance.

This is basically another variation of the "old" truth that on modern hardware, it's typically better to process data in large chunks uniformly; that might increase latency, but typically, the latency lost is made up by higher system throughput.

Best regards,
Marcus

Thanks in advance for your answers.

Kind Regards,

Gonzalo Arcos

[1] https://lists.gnu.org/archive/html/discuss-gnuradio/2015-05/msg00320.html
[2] https://gnuradio.org/redmine/projects/gnuradio/wiki/PerformanceCounters

_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

[Prev in Thread]

Current Thread

[Next in Thread]