[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Discuss-gnuradio] Data (was: Re: Run graph/ scheduler overhead)
From: |
Dennis Glatting |
Subject: |
[Discuss-gnuradio] Data (was: Re: Run graph/ scheduler overhead) |
Date: |
Thu, 23 Jul 2015 12:48:43 -0700 |
I copied out the dc_block_cc block from 3.7.8 and ran some performance
tests against it, which I've summarized in a table below.
I had to make some modifications to the original code, such as:
* I removed the make wrapper.
* I tested against different containers.
* Different containers have different access/management methods
which meant some changes to code body (I tried to be consistent).
* On input I passed a std::vector to work() rather than complex*.
Although this changes the flavor of work() I figure it's relative.
* I only used long_form and deleted the short_form code. I used
the key part of the original code.
The three containers are the original std::deque then std::queue and
std::list. The results are interesting. I probably should have looked at
other containers such as std::vector but that might require recoding.
I also compiled with and without -std=c++11 because when i looked at
container source I saw a bunch of #ifdefs for >= c++0x.
These are some of the problems with the original dc_block:
* Passing by value rather than by reference.
* No inlines.
* const needed where const should be.
So in a second copy of dc_block I did those things. I found a case
(filter()) where it returns by value and I left that one alone.
The table below summarizes the results. "Old" means my reasonable(?)
facsimile of the original dc_block. "+c11" means I added -std=c++11 to
the compile line. "Opt" is my optimized copy of the code where I added
references, inlines, etc. "Special" is "opt" but with different compile
options. All of the output is included at the end of this message.
The numbers you'll see for old/c++1/etc is the amount of time it took to
process /one/ sample. In "old+deque" for example (the first item), it
took 701us to process a sample. One of the surprising numbers is that
std::list sucks. Also, when looking at the assembly language for
filter() (copy below) I see reallocs(). That's not surprising and
probably badness. (BTW, "CPLX" is: "typedef std::complex<float> CPLX;".)
inline const CPLX
moving_averager_c_list::filter( const CPLX& x ) {
d_out_d1 = d_out;
d_delay_line.push_back(x);
d_out = d_delay_line.front();
d_delay_line.pop_front();
CPLX y = x - d_out_d1 + d_out_d2;
d_out_d2 = y;
return (y / (float)(d_length));
}
The "size" numbers in the table are the text segment size returned using
"size a.out". The "block size" is simply a sizeof(d_delay_line), which
is really sizeof(std:deque<CPLX>) for example.
One other note. I compiled "special" with -Ofast and it failed content
integrity check. Probably a bad option to use. :)
My os: Ubuntu 15.04.
My compiler: gcc version 4.9.2 (Ubuntu 4.9.2-10ubuntu13)
My system: AMD FX(tm)-9590 Eight-Core Processor @ 4.7GHz
I'm happy to send copies of the test code (two files) for review if
someone wants to put them on the web. The three main code blocks are
pretty simple:
{ dc_blocker_cc_deque dc( NUM_ELEM );
std::cout << "deque:" << std::endl;
t_start = gr::high_res_timer_now();
for( int i = 0; i < NUM_LOOPS; ++i )
for( int j = 0; j < NUM_COMPLEX; ++j )
dc.work( data, dc_deque );
timing( t_start, gr::high_res_timer_now(), NUM_LOOPS*NUM_COMPLEX );
}
#define NUM_LOOPS 5
#define NUM_COMPLEX 10000
#define NUM_ELEM 32
Here's the summary table:
old old+c11 opt opt+c11 special
deque: 0.000701038 0.000705963 0.000235234 0.00023607 0.000234233
queue: 0.00069784 0.000705617 0.00023619 0.00023222 0.000237184
list: 0.00194583 0.00243208 0.00191296 0.00193926 0.00194809
text
size: 26502 28902 21712 29574 23112
text
orig: 33821 26502
block size:
deque: 80
queue: 80
list: 16
Original facsimile (not c++11):
address@hidden:~/dc_test$ c++ -O3 main.cc
address@hidden:~/dc_test$ size a.out
text data bss dec hex filename
28902 856 280 30038 7556 a.out
address@hidden:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8
dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 35051914970, sec_t: 35.0519, t/ea: 0.000701038
dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 34892023951, sec_t: 34.892, t/ea: 0.00069784
dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 97291349192, sec_t: 97.2913, t/ea: 0.00194583
Original facsimile (c++11):
address@hidden:~/dc_test$ c++ -O3 -std=c++11 main.cc
address@hidden:~/dc_test$ size a.out
text data bss dec hex filename
21712 848 280 22840 5938 a.out
address@hidden:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8
dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 35298153446, sec_t: 35.2982, t/ea: 0.000705963
dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 35280849767, sec_t: 35.2808, t/ea: 0.000705617
dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 121603777765, sec_t: 121.604, t/ea: 0.00243208
Optimized code (not c++11):
address@hidden:~/dc_test$ c++ -O3 -finline main_opt.cc
address@hidden:~/dc_test$ size a.out
text data bss dec hex filename
29574 856 280 30710 77f6 a.out
address@hidden:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8
dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 11761720007, sec_t: 11.7617, t/ea: 0.000235234
dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 11809516472, sec_t: 11.8095, t/ea: 0.00023619
dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 95647805916, sec_t: 95.6478, t/ea: 0.00191296
Optimized code (c++11):
address@hidden:~/dc_test$ c++ -O3 -finline -std=c++11 main_opt.cc
address@hidden:~/dc_test$ size a.out
text data bss dec hex filename
23080 848 280 24208 5e90 a.out
address@hidden:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8
dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 11803504003, sec_t: 11.8035, t/ea: 0.00023607
dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 11610977298, sec_t: 11.611, t/ea: 0.00023222
dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 96962902014, sec_t: 96.9629, t/ea: 0.00193926
special (opt+c++11):
address@hidden:~/dc_test$ c++ -Ofast -Wsign-compare -Wall
-Wno-uninitialized -fvisibility=hidden -finline -std=c++11 main_opt.cc
address@hidden:~/dc_test$ size a.out
text data bss dec hex filename
23112 856 280 24248 5eb8 a.out
address@hidden:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8
dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 11711630308, sec_t: 11.7116, t/ea: 0.000234233
dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 11859205796, sec_t: 11.8592, t/ea: 0.000237184
dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 97404287524, sec_t: 97.4043, t/ea: 0.00194809
Data error i=0
- [Discuss-gnuradio] Run graph/ scheduler overhead, Dennis Glatting, 2015/07/12
- Re: [Discuss-gnuradio] Run graph/ scheduler overhead, Dennis Glatting, 2015/07/12
- Re: [Discuss-gnuradio] Run graph/ scheduler overhead, West, Nathan, 2015/07/13
- Re: [Discuss-gnuradio] Run graph/ scheduler overhead, Tom Rondeau, 2015/07/13
- Re: [Discuss-gnuradio] Run graph/ scheduler overhead, Dennis Glatting, 2015/07/21
- Re: [Discuss-gnuradio] Run graph/ scheduler overhead, Marcus Müller, 2015/07/21
- [Discuss-gnuradio] Data (was: Re: Run graph/ scheduler overhead),
Dennis Glatting <=
- Re: [Discuss-gnuradio] Data (was: Re: Run graph/ scheduler overhead), Dennis Glatting, 2015/07/23
- Re: [Discuss-gnuradio] Data (was: Re: Run graph/ scheduler overhead), Tom Rondeau, 2015/07/24
- Re: [Discuss-gnuradio] Data (was: Re: Run graph/ scheduler overhead), Dennis Glatting, 2015/07/24
- [Discuss-gnuradio] First integration (was: Re: Run graph/ scheduler overhead), Dennis Glatting, 2015/07/25
- Re: [Discuss-gnuradio] First integration (was: Re: Run graph/ scheduler overhead), Richard Bell, 2015/07/25
- Re: [Discuss-gnuradio] First integration (was: Re: Run graph/ scheduler overhead), Dennis Glatting, 2015/07/25
- Re: [Discuss-gnuradio] Run graph/ scheduler overhead, Dennis Glatting, 2015/07/13
Re: [Discuss-gnuradio] Run graph/ scheduler overhead, Martin Braun, 2015/07/13