octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FFT slowup in 2.1.55


From: John W. Eaton
Subject: FFT slowup in 2.1.55
Date: Fri, 27 Feb 2004 12:02:49 -0600

On 27-Feb-2004, David Bateman <address@hidden> wrote:

| Ok, due to the alignment issues in FFTW 3.x I've found I can get
| into the case where I'm calling exactly the same FFT, but the 
| alignment changes each time I call the function. The planner in
| oct-fftw.cc then recreates the plan each time, which is slow.
| For small FFT's [I've tested fft(randn(64,45))], this results in 
| a significant slowup.
| 
| I've tried two lines of attack to fix this problem. The first is to
| try and force the 16-byte alignment in "class Array<T>" using
| 
| #ifdef HAVE_ATTRIB_ALIGN
|     typedef T alignedT __attribute__ ((aligned(16)));
| #else
|     typedef T alignedT;
| #endif
| 
| where HAVE_ATTRIB_ALIGN is a configure time option. I then replace the
| "new T" in ArrayRep with "new alignedT". This appears to only give me
| 8-byte alignment, but I've read in the gcc manual that ld might not be
| able to do better than 8-byte alignment, so it is not clear to me if
| 
| 1) The code I've done is correct, but the linker won't give me 16-byte
| alignment, or
| 2) I've missing somewhere which also needs 16-byte alignment.
| 
| Does anyone have any ideas on this?

Doesn't the __attribute__ ((aligned(16)) qualifier apply to variables,
not types?  It seems to me that attaching it to a typedef would have
no effect.

In any case, the variable you want to align in the Array classes
doesn't actually exist in the class.  We only have a pointer to it,
and you don't really want the pointer to be aligned on a 16-bit
boundary, you want the object that it points to to be aligned.  So I
think you have to write an allocator that will do that for you.  So
instead of writing

  explicit ArrayRep (int n) : 
    data (new alignedT [n]), len (n), count (1) { }

I think it might work to write

  explicit ArrayRep (int n) : 
    data (make_aligned_16_double_array (n)), len (n), count (1) { }

where "make_aligned_16_double_array" is a function that returns a
pointer to double which has the elements of the allocated array
aligned on a 16-byte boundary.  To do that, you will probably need to
allocate a char array of the appropriate size (might require up to 15
bytes of padding) then return a pointer to one of the first 15
elements with a cast to make it a pointer to double instead of a
pointer to char.

But the solution above won't quite work because ArrayRep is a template
class.  So we need to do this only for some values of the template
type paramater T (double and maybe Complex).  Some method of
specializing these functions is needed.  Here is a simplified example
that I think should work (I leave it up to you to determine how to
compute the necessary offset so that the data pointed to by the pointer
returned from the make_aligned_16_double_array function is actually on
a 16-byte boundary).

#include <iostream>

template <class T>
class
Array
{
public:

  class ArrayRep
  {
  public:

    T *data;
    int len;

    explicit ArrayRep (int n)
      : data (new T [n]), len (n) { std::cerr << "generic" << std::endl; }
  };

  ArrayRep *rep;

  explicit Array (int n)
    : rep (new typename Array<T>::ArrayRep (n)) { }
};

double *
make_aligned_16_double_array (int n)
{
  // Do something magic to find the required offset (should be in the
  // range [0, 15]. 
  int offset = 0;

  char *buf = new char [n * sizeof (double) + offset];

  return reinterpret_cast<double *> (&buf[offset]);
}

template <>
Array<double>::ArrayRep::ArrayRep (int n)
  : data (make_aligned_16_double_array (n)), len (n)
{
  std::cerr << "double" << std::endl;
}

int
main (void)
{
  Array<int> int_ra (2);
  Array<double> double_ra (2);
}

Or am I missing a simpler solution?

jwe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]