Forgot to include the link to my benchmarking tool:
https://github.com/marcusmueller/table_vs_volk
Had too look intensely for your mail:
Trek, please don't "hijack" other threads by replying to them with a
completely unrelated topic. If starting a new topic, simply send an
email to the mailing list, without using the "reply" functionality,
or else, most people won't even see it, because it's buried in a
discussion thread irrelevant to them.
Best regards,
Marcus
On 07.04.2016 11:40, Marcus Müller
wrote:
Hi Trek,
as Martin noted, yes, if you search the GNU Radio source tree for
that file name, you'll find it. And also, yes, GNU Radio is Free
Software, and one of the main credos of that is that you should be
able to use everything from it for your own purposes (as long as
you adhere to the freeness that the part you're using demands; for
GNU Radio, that's GPLv3). However, to be honest, a linear
approximation-based 8kB sine table might or might not be the right
tool for your problem – usually, one would just think about what
one needs and generate the sine table oneself, matching exactly
the requirements at hand.
Us being DSP nerds, I guess some of us are curious: what is your
fixed point application? Are you planning to use this on some
microcontroller, or some programmable logic device, or do you need
a sin where you transform fixed point values (e.g. from an
ADC) to floating point values? What is the algorithm you're
building with that?
However, are you /sure/ a sine table is the optimum for your
specific problem?
I'm not an overly big fan of uniform sine tables (they make a lot
of sense on e.g. microcontrollers that don't have advanced math
functions, and if you don't need the accuracy), but if you look at
VOLK, you'll find things that are comparably fast, or in my case,
even faster; using a benchmarking stub I've got lying around
(didn't specify any compiler optimizations, i.e. gcc will not
optimize).
Doing 100000000 operations.
fixed point
0.781710s wall, 0.780000s user + 0.000000s system = 0.780000s CPU (99.8%)
standard libc float32 sin
2.700463s wall, 2.700000s user + 0.000000s system = 2.700000s CPU (100.0%)
VOLK float32 sin
Using Volk machine: avx2_64_mmx_orc
0.331708s wall, 0.330000s user + 0.000000s system = 0.330000s CPU (99.5%)
dummy memory bandwidth test: copy out- to input
0.404707s wall, 0.400000s user + 0.000000s system = 0.400000s CPU (98.8%)
dummy memory bandwidth test: copy in- to output
0.406990s wall, 0.410000s user + 0.000000s system = 0.410000s CPU (100.7%)
Volk of course only makes sense if you can arrange your algorithms
so that you get a lot of sin input values continuously in memory.
Four observations:
- This sine-table implementation is but three times faster
than the standard libc sin, not even counting the fact that
you'd have to first come up with the proper input scaling.
Unless your program is really dominated by sin() performance,
this might not be even worth considering. A general hint: run
"perf record -a yourprogram"; "perf report" to find out where
your PC spent it's time. Well, at least without compiler
optimizations.
- The VOLK routine is twice as fast as the fixed point
implementation, and being a six-summand Taylor series
approximation probably more accurate.
- Enabling compiler optimizations (CFLAGS=-Ofast make) will
probably double the speed of sin (my experience), and severely
cut the the time that the fixed point implementation takes,
probably slightly below the time of Volk (which will not
change measurably). That's because the compiler will inline
everything in the fixed point routine. Whether that slight
advantage then will be worth the accuracy loss is up to you.
- VOLK's sin is faster than float-wise copy (here, without
compiler optimizations); what seems paradox shows that making
extensive use of memory alignment and SIMD brings you much
closer to the memory bandwidth barrier. Knowing my machine, I
now have a guess for the performance of the fixed point sin
table approach under heavy compiler optimization: it will take
around ¼ of the time one of the dummy copies takes; that's how
fast you get with 4-float32 SIMD here, assuming this is really
only bandwidth-limited. Trying this verifies my suspicion!
As you can see, the question what approach is fast really depends
on what your compiler does, what SIMD instructions you can make
use of (VOLK's sin only has optimizations for SSE4.1, I think) and
how your data lies in memory.
Best regards,
Marcus
On 07.04.2016 05:26, Trek Liu wrote:
What is the purpose of this file? There is zero
documentation in this file, is it ever being used?
I am looking for a sin/cos table for speed optimization, is
there one inside gnuradio?
Thanks.
_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
|