discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] QT GUI time sink (float) unnecessary memmove()?


From: Andy Walls
Subject: Re: [Discuss-gnuradio] QT GUI time sink (float) unnecessary memmove()?
Date: Sat, 28 Mar 2015 20:32:26 -0400
User-agent: K-9 Mail for Android

When testing, I used 5 float streams rumning at over 150 Msps each, with 15 microsecomd bursts of 50 MHz at about 10 microseconds apart. I used enough x points to see two bursts on the gui. Normal trigger. (Free or auto trigger moght be too taxing.)

-Regards
Andy

On March 28, 2015 8:06:08 PM EDT, Tom Rondeau <address@hidden> wrote:
On Sat, Mar 28, 2015 at 12:50 PM, Andy Walls <address@hidden> wrote:
On Sat, 2015-03-28 at 14:45 -0400, Andy Walls wrote:
> Hi Tom:
>
>
> On Sat, 2015-03-28 at 11:12 -0700, Tom Rondeau wrote:
> > On Sat, Mar 28, 2015 at 11:00 AM, Andy Walls
> > <address@hidden> wrote:
>
> >         Can this memmove() be safely skipped
> >
> >         https://github.com/gnuradio/gnuradio/blob/master/gr-qtgui/lib/time_sink_f_impl.cc#L627
> [snip]
> >         The volk_32f_convert_64f_u_avx() call is unavoidable as Qwt
> >         wants
> >         doubles for plotting and not floats. But it might also be able
> >         to be
> >         deferred to the very end when the decision to plot is known
> >         for sure.
> >         (But that's more surgery than I care to take on at the
> >         moment.)
>

>
> >  But thinking about the volk convert function, that's both copying the
> > data from the input buffer into the internal buffer as well as
> > performing the conversion. We can't just hold data in the input since
> > we don't want to back up the data until we're ready to plot both with
> > timing and with a full enough buffer -- it's just sampling a section
> > at a time and drops everything in between.
>
> Right.
>
> >  That part could be converted into a memcpy instead of the volk
> > convert. Then, when we're ready to plot, we call the volk convert that
> > also does the move from d_start to 0, so it combines those two
> > elements.
>
> Yeah, that's the surgery part. :)  It would require adding a new set of
> buffers to hold floats objects, and then convert them when a
> determination to plot was made.
>
> This also affects the memmove() of the tail for the trigger delay.  It
> would operate on the new set of float buffers (vs the buffers holding
> doubles).
>
> > Thoughts on those proposals?

Your proposal for implementing memcpy() and deferring volk_*() to do the
conversion and "memmove" in one step is great!  :)

I just implemented it, and the time_sink_f thread has gone from 41.5%
CPU down to 29.1% CPU in my tests. :)  memcpy() now dominates the
thread, but that's to be expected.



With my initial hack:

> CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
> samples  %        image name               symbol name
> 78158    39.0737  libvolk.so.0.0.0         volk_32f_convert_64f_u_avx
> 22777    11.3870  no-vmlinux               /no-vmlinux
> 13972     6.9851  libgnuradio-qtgui-3.7.7git.so.0.0.0 gr::qtgui::time_sink_f_impl::_test_trigger_slope(float const*) const
> 7781      3.8900  libgnuradio-qtgui-3.7.7git.so.0.0.0 gr::qtgui::time_sink_f_impl::_test_trigger_norm(int, std::vector<void const*, std::allocator<void const*> >)
> 7236      3.6175  libpthread-2.18.so       pthread_mutex_lock
> 6163      3.0811  libgnuradio-runtime-3.7.7git.so.0.0.0 boost::detail::sp_counted_base::release()
> 5942      2.9706  libpthread-2.18.so       pthread_mutex_unlock
> 4947      2.4732  libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_executor::run_one_iteration()
> 3826      1.9127  libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_detail::input(unsigned int)
> 3555      1.7773  libstdc++.so.6.0.19      /usr/lib64/libstdc++.so.6.0.19
> 3206      1.6028  libc-2.18.so             __memmove_ssse3_back
> [...]

With my implementation of your suggestion:

CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples  %        image name               symbol name
27595    35.6051  libc-2.18.so             __memcpy_sse2_unaligned
12225    15.7736  no-vmlinux               /no-vmlinux
4051      5.2269  libpthread-2.18.so       pthread_mutex_lock
3739      4.8243  libgnuradio-runtime-3.7.7git.so.0.0.0 boost::detail::sp_counted_base::release()
3362      4.3379  libpthread-2.18.so       pthread_mutex_unlock
2876      3.7108  libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_executor::run_one_iteration()
2364      3.0502  libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_detail::input(unsigned int)
2091      2.6980  libstdc++.so.6.0.19      /usr/lib64/libstdc++.so.6.0.19
1388      1.7909  libgnuradio-runtime-3.7.7git.so.0.0.0 gr::tpb_detail::notify_upstream(gr::block_detail*)
1138      1.4683  libc-2.18.so             __memmove_ssse3_back
[...]
2         0.0026  libvolk.so.0.0.0         __volk_32f_convert_64f_d
[...]
1         0.0013  libvolk.so.0.0.0         volk_32f_convert_64f_a_avx


Regards,
Andy


Andy, 

Excellent!

I've got a few other minor patches for some things, I'll put this in there to and test on my end as well.

Tom
 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]