discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_t


From: Tom Rondeau
Subject: Re: [Discuss-gnuradio] segmentation fault in qa_constellation_receiver_test
Date: Sun, 23 Feb 2014 14:00:47 -0500

On Fri, Feb 21, 2014 at 10:12 PM, Kelly Boswell <address@hidden> wrote:
> I'm encountering the same problem on maint.  And I did remember to rebuild.
> I removed the build directory, recreated it, and started over with cmake
> just to be sure.  It's the same stack trace.

Yeah, false alarm on volk_malloc. Turns out my issue was coming from
qtgui::freq_sink_c because we've introduced a new AVX proto-kernel and
that block was using the aligned kernel always (not the dispatcher).
There's one input that is a std::vector<float> which only guarantees
alignment to 16-bytes, so non-AVX calls were safe. I'm going to push a
patch for this soon.

So this will not address the problem you're having, which I cannot
reproduce (but hopefully Nathan can figure out since he can reproduce
the issue).

Tom


> On Fri, Feb 21, 2014 at 7:54 PM, West, Nathan <address@hidden>
> wrote:
>>
>> If you just want to get back to a system that passes QA you should
>> just be able to build off of maint.
>>
>> On Fri, Feb 21, 2014 at 6:05 PM, Kelly Boswell <address@hidden>
>> wrote:
>> > I removed the implementation of volk_malloc that uses posix_menacing by
>> > commenting everything from the #if to #else and the final #endif but the
>> > segmentation fault remains. I noticed it's being called in a few other
>> > files
>> > as well. Do I need to remove those, too? Thanks in advance.
>> >
>> > On Feb 21, 2014 10:21 AM, "Kelly Boswell" <address@hidden> wrote:
>> >>
>> >> Thank you, Tom. I'll try that after I'm off of work tonight. And thank
>> >> you
>> >> for the great ideas, Nathan.
>> >>
>> >> On Fri, Feb 21, 2014 at 2:39 AM, West, Nathan
>> >> <address@hidden> wrote:
>> >> > On Thu, Feb 20, 2014 at 11:25 PM, Kelly Boswell <address@hidden>
>> >> > wrote:
>> >> >> After the make test failed for this module, I decided to poke around
>> >> >> to
>> >> >> see
>> >> >> if there is an easy fix. I made a script that simply executes the
>> >> >> test
>> >> >> over
>> >> >> and over until it seg faults and exits after the core file is
>> >> >> created.
>> >> >>
>> >> >> address@hidden:~/src/gnuradio/build/gr-digital/python/digital$
>> >> >> ./runtests.sh
>> >> >> Using Volk machine: avx_64_mmx
>> >> >> Segmentation fault (core dumped)
>> >> >>
>> >> >> address@hidden:~/src/gnuradio/build/gr-digital/python/digital$ gdb
>> >> >> /usr/bin/python2.7 core
>> >> >> (gdb) bt
>> >> >> (gdb) bt
>> >> >> #0  0x00007fe8f627fb17 in volk_32fc_32f_dot_prod_32fc_a_avx ()
>> >> >>    from /home/kelly/src/gnuradio/build/volk/lib/libvolk.so.0.0.0
>> >> >> #1  0x00007fe8f52dd25f in
>> >> >> gr::filter::kernel::fir_filter_ccf::filter(std::complex<float>
>> >> >> const*)
>> >> >> ()
>> >> >>    from
>> >> >>
>> >> >>
>> >> >> /home/kelly/src/gnuradio/build/gr-filter/lib/libgnuradio-filter-3.8git.so.0.0.0
>> >> >> #2  0x00007fe8f143c45b in
>> >> >> gr::digital::pfb_clock_sync_ccf_impl::general_work(int,
>> >> >> std::vector<int,
>> >> >> std::allocator<int> >&, std::vector<void const*, std::allocator<void
>> >> >> const*>
>> >> >>>&, std::vector<void*, std::allocator<void*> >&) ()
>> >> >>    from
>> >> >>
>> >> >>
>> >> >> /home/kelly/src/gnuradio/build/gr-digital/lib/libgnuradio-digital-3.8git.so.0.0.0
>> >> >> #3  0x00007fe8f653809e in gr::block_executor::run_one_iteration() ()
>> >> >>    from
>> >> >>
>> >> >>
>> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
>> >> >> #4  0x00007fe8f6573622 in
>> >> >> gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>,
>> >> >> int)
>> >> >> ()
>> >> >>    from
>> >> >>
>> >> >>
>> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
>> >> >> #5  0x00007fe8f6565ea1 in
>> >> >>
>> >> >>
>> >> >> boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>,
>> >> >> void>::invoke(boost::detail::function::function_buffer&) ()
>> >> >>    from
>> >> >>
>> >> >>
>> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
>> >> >> ---Type <return> to continue, or q <return> to quit---
>> >> >> #6  0x00007fe8f6526610 in
>> >> >> boost::detail::thread_data<boost::function0<void>
>> >> >>>::run() ()
>> >> >>    from
>> >> >>
>> >> >>
>> >> >> /home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
>> >> >> #7  0x00007fe8f9adc94a in ?? ()
>> >> >>    from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.53.0
>> >> >> #8  0x00007fe8fc8a3f6e in start_thread (arg=0x7fe8e2ffd700)
>> >> >>     at pthread_create.c:311
>> >> >> #9  0x00007fe8fc5ce9cd in clone ()
>> >> >>     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>> >> >>
>> >> >> Of course, I had to recompile it with debugging info to glean
>> >> >> anything
>> >> >> useful from the stack trace.  So, I did that and I traced the bug to
>> >> >> this
>> >> >> line:
>> >> >>
>> >> >> c0Val = _mm256_mul_ps(a0Val, b0Val);
>> >> >>
>> >> >> I can't dump the values in a0Val or b0Val, though, because they're
>> >> >> intermediate values that are optimized away by the optimized kernel
>> >> >> code.  I
>> >> >> tried stepping through the assembler instructions but I'm not
>> >> >> familiar
>> >> >> with
>> >> >> the various sse and avx extensions. Heck, I'm not even familiar with
>> >> >> the
>> >> >> x86_64 instruction set.  So I have a huge learning curve ahead of
>> >> >> me,
>> >> >> there.
>> >> >> Is it possible to just dump the values in these __m256 data types to
>> >> >> a
>> >> >> file
>> >> >> so I can debug it that way?  If that's not easy to do, then I'm
>> >> >> willing
>> >> >> to
>> >> >> learn what I have to about the instruction set so I can debug this
>> >> >> thing.
>> >> >> But I would sure appreciate some help if anyone has some advice to
>> >> >> offer.
>> >> >>
>> >> >> Software version:
>> >> >> I rebased to the latest version of the next branch last night before
>> >> >> I
>> >> >> went
>> >> >> to bed at around 1:30 am CDT.
>> >> >>
>> >> >> Operating System:
>> >> >> address@hidden:~/src/gnuradio/volk/kernels/volk$ uname -a
>> >> >> Linux octs2 3.11.0-17-generic #31-Ubuntu SMP Mon Feb 3 21:52:43 UTC
>> >> >> 2014
>> >> >> x86_64 x86_64 x86_64 GNU/Linux
>> >> >> It's Ubuntu 13.10
>> >> >>
>> >> >> Hardware: ASUS X750J
>> >> >> Intel Quad Core i7 4700HQ 2.4GHz
>> >> >>
>> >> >> cpuinfo:
>> >> >> processor    : 7
>> >> >> vendor_id    : GenuineIntel
>> >> >> cpu family    : 6
>> >> >> model        : 60
>> >> >> model name    : Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
>> >> >> stepping    : 3
>> >> >> microcode    : 0x8
>> >> >> cpu MHz        : 2401.000
>> >> >> cache size    : 6144 KB
>> >> >> physical id    : 0
>> >> >> siblings    : 8
>> >> >> core id        : 3
>> >> >> cpu cores    : 4
>> >> >> apicid        : 7
>> >> >> initial apicid    : 7
>> >> >> fpu        : yes
>> >> >> fpu_exception    : yes
>> >> >> cpuid level    : 13
>> >> >> wp        : yes
>> >> >> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> >> >> mca
>> >> >> cmov
>> >> >> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
>> >> >> pdpe1gb
>> >> >> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
>> >> >> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl
>> >> >> vmx
>> >> >> est
>> >> >> tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
>> >> >> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat
>> >> >> epb
>> >> >> xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
>> >> >> fsgsbase
>> >> >> tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
>> >> >> bogomips    : 4789.27
>> >> >> clflush size    : 64
>> >> >> cache_alignment    : 64
>> >> >> address sizes    : 39 bits physical, 48 bits virtual
>> >> >> power management:
>> >> >>
>> >> >
>> >> >
>> >> > Hi Kelly,
>> >> >
>> >> > First, this is great debugging, thanks for getting so much info and
>> >> > trying to go for a fix on your own.
>> >> >
>> >> > On to the good stuff. I was able to reproduce this on my i7-4700MQ.
>> >> > Here's some additional info for the logs:
>> >> >
>> >> > * constellation_receiver is a hier block with a fir_filter_ccf inside
>> >> > that is calling the volk avx dot product.
>> >> > * The avx dot product proto-kernel passes VOLK QA
>> >> > * The qa_fir_filter.py is testing a fir_filter_ccf that passes its
>> >> > QA.
>> >> > * Just for kicks, I forced VOLK to use the generic kernel and I still
>> >> > see the segfault.
>> >> >
>> >> > A couple of things I'd like to try (and please feel free to give
>> >> > these a
>> >> > try):
>> >> > * Go back to a commit just before fir_filter.cc started using
>> >> > volk_malloc and volk_free.  (or for bonus points go back to some
>> >> > point
>> >> > in time when this test always passes and do a git bisect)
>> >> > * fiddle with parameters of the test, data length, number of taps in
>> >> > filter, etc.
>> >> > * Doubtful this would change, but test on different processors. It
>> >> > would be pretty wild if there was something off in the 4700 line, but
>> >> > the fact that the generic proto-kernel had the same result and nobody
>> >> > else has reported this yet is suspicious. My guess is GCC is actually
>> >> > emitting *very* similar code for the generic and avx dot product
>> >> > proto-kernels.
>> >> >
>> >> > Nathan
>> >>
>> >>
>> >> I was having similar issues this week with some AVX boxes. It looks
>> >> like it's a problem using posix_memalign (which is called by
>> >> volk_malloc if posix_memalign is available). Removing the use of
>> >> posix_memalign solves my problem. I'll work with Nathan off-list to
>> >> see about fixing this, possibly by removing the use of that version of
>> >> malloc.
>> >>
>> >> Tom
>> >
>> >
>> > _______________________________________________
>> > Discuss-gnuradio mailing list
>> > address@hidden
>> > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>> >
>
>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]