octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FFT slowup in 2.1.55


From: David Bateman
Subject: Re: FFT slowup in 2.1.55
Date: Mon, 1 Mar 2004 11:15:24 +0100
User-agent: Mutt/1.4.1i

According to John W. Eaton <address@hidden> (on 02/27/04):
> Doesn't the __attribute__ ((aligned(16)) qualifier apply to variables,
> not types?  It seems to me that attaching it to a typedef would have
> no effect.

Damn, you're right. I'd thought with the typedef I was getting 
around this, but I'm obviously not. Still the patch to oct-fftw.cc
is still valid and results in a speedup.

> In any case, the variable you want to align in the Array classes
> doesn't actually exist in the class.  We only have a pointer to it,
> and you don't really want the pointer to be aligned on a 16-bit
> boundary, you want the object that it points to to be aligned.  So I
> think you have to write an allocator that will do that for you.  So
> instead of writing
> 
>   explicit ArrayRep (int n) : 
>     data (new alignedT [n]), len (n), count (1) { }
> 
> I think it might work to write
> 
>   explicit ArrayRep (int n) : 
>     data (make_aligned_16_double_array (n)), len (n), count (1) { }
> 
> where "make_aligned_16_double_array" is a function that returns a
> pointer to double which has the elements of the allocated array
> aligned on a 16-byte boundary.  To do that, you will probably need to
> allocate a char array of the appropriate size (might require up to 15
> bytes of padding) then return a pointer to one of the first 15
> elements with a cast to make it a pointer to double instead of a
> pointer to char.
> 
> But the solution above won't quite work because ArrayRep is a template
> class.  So we need to do this only for some values of the template
> type paramater T (double and maybe Complex).  Some method of
> specializing these functions is needed.  Here is a simplified example
> that I think should work (I leave it up to you to determine how to
> compute the necessary offset so that the data pointed to by the pointer
> returned from the make_aligned_16_double_array function is actually on
> a 16-byte boundary).
> 
> #include <iostream>
> 
> template <class T>
> class
> Array
> {
> public:
> 
>   class ArrayRep
>   {
>   public:
> 
>     T *data;
>     int len;
> 
>     explicit ArrayRep (int n)
>       : data (new T [n]), len (n) { std::cerr << "generic" << std::endl; }
>   };
> 
>   ArrayRep *rep;
> 
>   explicit Array (int n)
>     : rep (new typename Array<T>::ArrayRep (n)) { }
> };
> 
> double *
> make_aligned_16_double_array (int n)
> {
>   // Do something magic to find the required offset (should be in the
>   // range [0, 15]. 
>   int offset = 0;
> 
>   char *buf = new char [n * sizeof (double) + offset];
> 
>   return reinterpret_cast<double *> (&buf[offset]);
> }
> 
> template <>
> Array<double>::ArrayRep::ArrayRep (int n)
>   : data (make_aligned_16_double_array (n)), len (n)
> {
>   std::cerr << "double" << std::endl;
> }
> 
> int
> main (void)
> {
>   Array<int> int_ra (2);
>   Array<double> double_ra (2);
> }
> 
> Or am I missing a simpler solution?

I'd hoped not to have to do this since its ugly. It seems to me that 
there should be some form of compiler magic that should be capable of
ensuring the alignment, but I'm damned if I can figure it out.

Ok, I'll look at implementing the above. One advantage, even with the
ugliness, is the fact that it doesn't rely on the compiler and so there
are no needs for alignment checking at all in oct-fftw.cc

Cheers
David

-- 
David Bateman                                address@hidden
Motorola CRM                                 +33 1 69 35 48 04 (Ph) 
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax) 
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as: 

[x] General Business Information 
[ ] Motorola Internal Use Only 
[ ] Motorola Confidential Proprietary



reply via email to

[Prev in Thread] Current Thread [Next in Thread]