Re: [Discuss-gnuradio] Segfault with volk on 32 bit AMD
From:
Frederick Stevens
Subject:
Re: [Discuss-gnuradio] Segfault with volk on 32 bit AMD
Date:
Tue, 20 Mar 2012 22:16:58 -0500
User-agent:
Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20120314 Thunderbird/11.0
Tom,
See dmesg text below:
[53882.751765] volk_profile[30672]: segfault at b7191000 ip b76acb74
sp bfddc894 error 4 in libvolk.so.0.0.0[b768b000+ec000]
I'm thinking that this is a linux kernel/hardware issue. I am going
to do some more testing on different hardware but same processor and
Slackware version, etc.
Cheers,
Fred
On 03/20/2012 06:55 PM, Tom Rondeau wrote:
On Tue, Mar 20, 2012 at 2:24 PM, Nick
Foster <address@hidden>
wrote:
On Tue, Mar 20, 2012 at 10:59 AM, Tom
Rondeau <address@hidden>
wrote:
On Tue, Mar 20, 2012 at 11:03 AM, Frederick
Stevens <address@hidden>
wrote:
Tom, et.
al.
Here is the output from the volk_profile run
(see attached).
Cheers,
Fred
Well, Fred, that output looks good. Everything's
showing up as it should. Interesting that it passed
this time, but I half expected it to. It seems like
there's a memory allocation problem going on, since
when it crashes, it did so just a bit after getting
half way through, must have been when it hit
something else llocated. Very odd behavior that I've
seen on occasion, but they way the memory is
allocated in the volk_qa_aligned_mem_pool, I
wouldn't expect there to be a problem.
Right now, I'm at a loss on how to proceed.
I can't think of any more really useful tests.
Were I able to reproduce this error on one of my
machines, I'd just have to start tinkering around
and getting more output data.
If volk_qa_aligned_mem_pool were passed the wrong
"type" argument due to a parser error, it could allocate
half the required memory. But I'd expect to see
significantly more catastrophic failures across many more
machines & tests if this were the case.
--n
Yes, the output info I asked Fred to produce was to test
for exactly that problem, but it's setting the correct type
signature for all inputs and outputs, so that's not it. It
_looks_ like it's allocating the right amount of memory and
nothing suggests that it's not (except for the segfault, that
is).
Tom
Tom
On 03/20/2012 09:47 AM, Marcus D.
Leech wrote:
On 03/20/2012
10:42 AM, Tom Rondeau wrote:
Fred,
Thanks. Can you get the
entire output (in a text file)?
There's some information that's
printed at the top that's
important. Just run it from the
command-line and pipe (>) the
output into a file.
<pedantic>
Just because I'm a grumpy old Unix guy
from waaaaay back, I'll point out that
the term "pipe" is very frequently
mis-used to mean
"redirect", when in fact, the pipe
symbol in the Unix shell is "|" and is
a mechanism for attaching the standard
output of one program
to the standard input of another.
The ">" symbol means "redirect the
standard output to a file", which is
similar, but not the same as,
the use of a "pipe", which is an IPC
mechanism.
</pedantic>
Oh, and that trailing
whitespace warning shouldn't be
a problem. The patch should have
still be applied.
Thanks,
Tom
On 03/20/2012 08:49
AM, Tom Rondeau wrote:
On
Mon, Mar 19, 2012 at
4:49 PM, Frederick
Stevens <address@hidden>
wrote:
Tom,
New run using my
simple "trace"
See attached
files.
Cheers,
Fred
Fred,
A good start.
It's only going
through half of the
data it's supposed
before seg faulting,
so it's like one of
the buffers
(probably the bPtr
buffer to the 32f
input) isn't getting
allocated properly.
I've attached a
patch that only
tests this kernel so
no other outputs
will confuse things
and I've shortened
the run length
(single iteration,
fewer samples). This
now spits out the
data used to
generate the input
and output buffers.
It also outputs the
size of the data
types in the test
instead of the
pointer size.
if you're
unfamiliar with
working with
patches, just reset
your git tree (git
reset --hard, unless
you have some
changes you need to
/ want to keep) and
apply this (git
apply
location/volk_slackware32.diff).
I suggest the reset
so there aren't any
conflicts or
problems when
applying.
Thanks,
Tom
On
03/19/2012
11:26 AM, Tom
Rondeau wrote:
On
Mon, Mar 19,
2012 at 12:04
PM, Frederick
Stevens <address@hidden>
wrote:
Tom,
See the
attached
file. I am
running
volk_profile
now. If this
is what you
need then that
is great
otherwise I
will keep
working on
this with
whatever
suggestions
you have.
Cheers,
Fred
That'll
be a good
start. We'll
see if that
tells us
anything.
Thanks,
Tom
On
03/19/2012
08:10 AM, Tom
Rondeau wrote:
On
Sun, Mar 18,
2012 at 8:00
PM, Frederick
Stevens <address@hidden>
wrote:
Volk_profile ran to completion. I am using the git
source tree
updated just
before I did
the run. I
commented out
line 38 of
volk_profile.cc
as you
suggested and
ran
volk_profile
under gdb.
The output is
in the
attached text
file. I have
also attached
the generated
volk_config
from
~/.volk/volk_config.
Thanks.
Strange that
it's just that
kernel, then.
Can you put in
some debug
lines that
will print out
the size of
the buffers
being used and
the 'number'
variable in
volk_32fc_x2_multiply_32fc_a
when the crash
occurs. I just
want to see if
the loop is
trying to go
beyond the
bounds of the
arrays.
I noted from running gnuradio-companion version
3.5.1, (which
works) that
when I use a
multiply
block, this
message from
python is
generated:
./top_block.py
>>>
gr_fir_fff:
using 3DNow!
but
volk_profile
does not seem
to recognize
the 3DNow!
processor
extensions
(produces sse2
and sse3
messages on
the Intel Atom
32 bit
machine).
Yeah,
that's fine.
Without a
3DNow! kernel,
Volk will just
fall back on
the generic
implementation.
The thought
being that the
generic
version will
work for
everyone. So
we need to
figure out why
that's not
true for
your...
Hope this helps! Let me know if you want me to try
anything
else. I'll
let you know
how things
turn out on
the other
machine as
well.
Cheers,
Fred
Thanks.
Tom
On 03/18/2012
04:31 PM, Tom
Rondeau wrote:
On
Fri, Mar 16,
2012 at 6:11
PM, Frederick
Stevens <address@hidden>
wrote:
Well, after a few restarts, here is my output. I did
a fresh pull
from git
because I was
getting some
errors with
missing *.h
files in
gruel/src/swig
or something
like that.
Hope this
helps!
RUN_VOLK_TESTS:
volk_32fc_32f_multiply_32fc_a
Program
received
signal
SIGSEGV,
Segmentation
fault.
0xb7edbb74 in
volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008,
bVector=0xb78f8008,
num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
74
*cPtr++ =
(*aPtr++) *
(*bPtr++);
(gdb) bt
#0 0xb7edbb74
in
volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008,
bVector=0xb78f8008,
num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
Alright,
Fred,
definitely
something
strange going
on here. My
only guess is
that for some
reason on your
architecture/OS/whatever,
something is
being handled
incorrectly
and the
buffers a, b,
and c are not
getting
generated
correctly,
maybe
something like
it's not
doubling the
number of
items for the
complex data
type (before
this function
test, there
are 16ic, or
complex
shorts, being
tested, but
this is the
first complex
float test).
It's hard
to tell if
it's something
about it being
an AMD chip,
32-bit,
Slackware
version, gcc
version, etc.
And I don't
have an AMD
chip to test
on, but I
could load up
a 32-bit
Slackware VM
at least.
How much
work are you
willing to put
into this to
help us nail
this down?
If you
can follow
through the
volk_profile
test code, we
can start
outputting
more debug
info. To start
with, I'd
suggest going
into
volk/apps/volk_profile.cc
and commenting
out line 38,
rebuild the
application,
and run this
new
volk_profile
to see if it
fails on any
other kernels.