discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] benchmark_* not working correctly


From: Eric Blossom
Subject: Re: [Discuss-gnuradio] benchmark_* not working correctly
Date: Tue, 2 Oct 2007 08:28:30 -0700
User-agent: Mutt/1.5.9i

On Tue, Oct 02, 2007 at 04:10:07AM -0700, Tim Meehan wrote:
> Eric,
> 
> See reply embedded

Thanks.

> On 10/1/07, Eric Blossom <address@hidden> wrote:
> >
> > On Mon, Oct 01, 2007 at 06:07:51PM -0700, Tim Meehan wrote:
> > > Eric,
> > >
> > >
> > > The QA code (qa_gr_fir_ccf.cc) forces a 16 byte alignment.  When the
> > > malloc16Allign is replaced with a regular malloc in the QA code, make
> > check
> > > fails.
> > >
> > > I believe that there is an additional requirement that the data passed
> > to
> > > the low-level SSE code have the real sample start on the 0th or 2nd 4
> > byte
> > > float.  For example the R / C represents 4 byte floats (Real, Complex) ,
> > 0
> > > represents "forced alignment" from gr_fir_ccf_simd.cc
> > > RCRC...  OK
> > > 00RC...  OK
> > > 0RCR...  Not OK
> >
> > Hmmm.  Does it ever use the 0RCR case?  I would expect only the first
> > two.  It may be reusing the fff simd code which generates all 4
> > alignments for the taps, but I wouldn't expect to see the 0RCR or 000R
> > input cases.
> 
> 
> Yes I do see the 0RCR or 0000R case.  For example when I change the QA code
> to use stack allocation for the input  (uncommenting a piece of code that
> was originally there, lines 110 and 111 in the QA code from trunk) the check
> will fail.

OK, I'm not surprised by that.  I wouldn't consider that a problem,
unless we get in the habit of calling this with stack allocated
input.  It could of course be worked around with an attribute((...))
on the array definition.

> Input is at address 0xbcd87d4   this gets 16-byte aligned to address
> 0xbfcd87d0. This illustrates the 0RCR case.


> > Q: Is my assumption of the additional requirement correct?
> > >
> > > Q: I don't think it will be easy to force the additional requirement
> > with
> > > the same trick used in gr_fir_ccf_simd.cc; do you agree?
> >
> > I don't see that this as an additional constraint.
> > gr_complex == std::complex<float> is always laid out (<real>,<imag>).
> > sizeof(gr_complex) == 8, so with 16-byte alignment, we still always
> > have good alignment.  Are you seeing a case where the input has the
> > real on a mod 8 == 4 boundary instead of a mod 8 == 0 boundary?
> 
> 
> 
> yes, see example below.
> 
> If so, (1) where's the input data coming from, (2) what version of the
> > compiler are you using?
> 
> 
> 
> 1)
> In the example above the data was allocated on the stack from the qa code
> with
> *i_type      input[INPUT_LEN];    //(i_type is gr_complex)*
> 
> which will case the QA code fail
> instead of
> 
> i_type       *input = (i_type *)malloc16Align(INPUT_LEN * *sizeof*(i_type));
> 
> which is in the QA code, and will make it pass.
> 
> 2)
> I am using three different compiles / versions of gcc on two different
> machines getting the same results
> 
> gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
> gcc (GCC) 4.2.1 (Debian 4.2.1-3)
> gcc (GCC) 4.1.2.20070502 (Red Hat 4.1.2-12)
> 
> However, back to your first point, if we are using the 0RCR case, then
> > the code is completely wrong, and I don't see how it could ever pass
> > the QA tests (which it seem to).  On the other hand, there could be
> > some problem with how the float taps are mapped across the complex
> > input  (It's been along time since I looked at the code...)
> 
> 
> The QA tests are passing because they force the 16-byte alignment.

OK.  This is as expected.

In the production code, I.e., where you are seeing problems (was it 
around gri_mmse_fir_interpolator?), do you see the alignment
problem occur?

If so, I think we should fix the caller.  If the calling site is using
stack or heap allocated data, we should fix it there.  If it's using
input passed to it by "work" or "general_work", they are already
aligned.  In any case, we should add a check at the site of the call
to the SSE code that checks the alignment and raises an exception in
the bad cases.  Of course the SSE code could be modified to handle the
other two alignment cases, but I'd like to know the performance cost
of doing it that way before committing to that path.

Summary question: is there an alignment problem when called from the
non-QA code?  If so, where?

Eric




reply via email to

[Prev in Thread] Current Thread [Next in Thread]