discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] benchmark_* not working correctly


From: Tim Meehan
Subject: Re: [Discuss-gnuradio] benchmark_* not working correctly
Date: Tue, 2 Oct 2007 04:10:07 -0700

Eric,

See reply embedded

On 10/1/07, Eric Blossom <address@hidden> wrote:
On Mon, Oct 01, 2007 at 06:07:51PM -0700, Tim Meehan wrote:
> Eric,
>
>
> The QA code (qa_gr_fir_ccf.cc) forces a 16 byte alignment.  When the
> malloc16Allign is replaced with a regular malloc in the QA code, make check
> fails.
>
> I believe that there is an additional requirement that the data passed to
> the low-level SSE code have the real sample start on the 0th or 2nd 4 byte
> float.  For example the R / C represents 4 byte floats (Real, Complex) , 0
> represents "forced alignment" from gr_fir_ccf_simd.cc
> RCRC...  OK
> 00RC...  OK
> 0RCR...  Not OK

Hmmm.  Does it ever use the 0RCR case?  I would expect only the first
two.  It may be reusing the fff simd code which generates all 4
alignments for the taps, but I wouldn't expect to see the 0RCR or 000R
input cases.

Yes I do see the 0RCR or 0000R case.  For example when I change the QA code to use stack allocation for the input  (uncommenting a piece of code that was originally there, lines 110 and 111 in the QA code from trunk) the check will fail. 
Input is at address 0xbcd87d4   this gets 16-byte aligned to address
0xbfcd87d0
This illustrates the 0RCR case.





> Q: Is my assumption of the additional requirement correct?
>
> Q: I don't think it will be easy to force the additional requirement with
> the same trick used in gr_fir_ccf_simd.cc; do you agree?

I don't see that this as an additional constraint.
gr_complex == std::complex<float> is always laid out (<real>,<imag>).
sizeof(gr_complex) == 8, so with 16-byte alignment, we still always
have good alignment.  Are you seeing a case where the input has the
real on a mod 8 == 4 boundary instead of a mod 8 == 0 boundary?


yes, see example below.

If so, (1) where's the input data coming from, (2) what version of the
compiler are you using?


1)
In the example above the data was allocated on the stack from the qa code
with
i_type      input[INPUT_LEN];    //(i_type is gr_complex)

which will case the QA code fail
instead of

i_type       *input = (i_type *)malloc16Align(INPUT_LEN * sizeof(i_type));

which is in the QA code, and will make it pass.

2)
I am using three different compiles / versions of gcc on two different machines getting the same results

gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
gcc (GCC) 4.2.1 (Debian 4.2.1-3)
gcc (GCC) 4.1.2.20070502 (Red Hat 4.1.2-12)

However, back to your first point, if we are using the 0RCR case, then
the code is completely wrong, and I don't see how it could ever pass
the QA tests (which it seem to).  On the other hand, there could be
some problem with how the float taps are mapped across the complex
input  (It's been along time since I looked at the code...)

The QA tests are passing because they force the 16-byte alignment.


Thanks for looking at this!

Eric

> Tim
>
> >
> >
> > Yes, it does get called at "make check" time.
> >
> > FWIW, it's run by way of gnuradio-core/src/tests/test_all
> >
> > It's possible that there's an alignment requirement that's not being
> > honored at runtime.  The low-level SSE code (fcomplex_dotprod_sse64.S)
> > requires that its input and taps be 16-byte aligned.  gr_fir_ccf_simd
> > allocates 16-byte aligned buffers for the relevant buffers, so it
> > should be working OK.   Perhaps one of you seeing the problem could
> > add an assert or two to confirm that the alignment is correct.
> >
> > Eric


reply via email to

[Prev in Thread] Current Thread [Next in Thread]