discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible im


From: Yu-Hua Yang
Subject: Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible improvements
Date: Tue, 7 Jul 2009 02:57:16 -0400

any help?

2009/7/2 Yu-Hua Yang <address@hidden>


---------- Forwarded message ----------
From: Yu-Hua Yang <address@hidden>
Date: 2009/7/2
Subject: Re: [Discuss-gnuradio] CUDA-Enabled GNURadio gr_benchmark10 possible improvements
To: Martin DvH <address@hidden>
Cc: discuss-gnuradio <address@hidden>


Thanks Martin, for your generous effort to help me.

It appears only one time so I think I am in the clear.

I decided to abandon and comment out all the cuda.multiply_const_ff function calls and concentrate on cuda.fir_filter_fff as suggested. Things I got questions/concerns

1. I increased output_multiple by doing "options.output_multiple = xxx" and this has no effect on the computing time of either CUDA or CPU. Did I do something wrong?
2.  I increased the taps by doing "taps = range(1,256)" and also increasing number of blocks of fir_filter in the code and voila, I am now able to get CUDA to be faster than just CPU. However, if I implement something like "taps = range(1,512)" the CUDA part would be extremely slow (~20 seconds) while the CPU is still cool (~ 2 sec). Why? But this maybe related to what you were saying about max number of taps...although why is CPU able to still compute?
3. I had to increase the number of fir_filter blocks to 14 blocks before I can start seeing CUDA out-perform CPU. Experimentally its fine, I achieved my objective, but how is this "increased computation" justified in a normal GNURadio operation? I mean, when would a normal GNURadio operation require a chain of 14 fir_filters? I guess this is going beyond just "benchmarking" and asking where else can I take advantage of CUDA's computation power in GNURadio in a "normal" operation?
4. Looking at cuda_fir_fff_7_kernel, which I believe is the core of cuda_fir_filter, it seems you are using shared memory right? Just making sure we are not using global or local memory which would disastrously slow down the CUDA computations.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]