[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
boost::array loop unrolling performance
From: |
per . nordlow |
Subject: |
boost::array loop unrolling performance |
Date: |
4 Jul 2006 03:04:10 -0700 |
User-agent: |
G2/0.2 |
Hi, C++ Lovers!
I am using the boost::array template class trying to generalize my
handcrafted vector specialization for the dimensions 2, 3 and 4.
As performance is of great importance to me I have written an initial
benchmarker that tests how well g++ can unroll loops whose number of
iterations can be determined at compile time or upon entry to the loop.
The gcc switch "-funroll-loops" should do just that. The test program
calculates the dotproduct of two four-dimensional arrays of int 10
million times and looks like follows:
The calculation is performed with a general and a specialized version
the dot product: general_dot() and special_dot() respectively.
#include <boost/array.hpp>
#include "../Timer.hpp"
template <typename T, std::size_t N>
inline T general_dot(const boost::array<T, N> & a,
const boost::array<T, N> & b)
T c = 0;
for (size_t i = 0; i < N; i++)
c += a[i] * b[i];
return c;
template <typename T>
inline T special_dot(const boost::array<T, 4> & a,
const boost::array<T, 4> & b)
return (a[0] * b[0] +
a[1] * b[1] +
a[2] * b[2] +
a[3] * b[3]);
template <typename T, std::size_t N>
std::ostream & operator << (std::ostream & os,
const boost::array<T, N> & a)
os << '[';
for (size_t i = 0; i < N; i++)
os << ' ' << a[i];
os << ']';
return os;
typedef int S; //*< Scalar Type.
int main(int argc, char * argv[])
typedef boost::array<S, 4> T;
T a;
T b = a;
Timer t;
const unsigned int nloops = 10000000;
S sum = 0;
for (unsigned int i = 0; i < nloops; i++)
sum += general_dot(a, b);
std::cout << "general: " << t << std::endl;
S tum = 0;
for (unsigned int i = 0; i < nloops; i++)
tum += special_dot(a, b);
std::cout << "special: " << t << std::endl;
if (sum == tum)
std::cout << "Checksums are equal. OK" << std::endl;
std::cout << "Checksums are not equal. NOT OK" << std::endl;
return 0;
Compiling with g++-3.3.6 using the switches "-O3 -funroll-all-loops"
and running this on my Pentium 4 yields the following benchmark:
general: 60.965ms
special: 902us
Checksums are equal. OK
As we can see the performance of the general_dot() is terrible (~60
times slower) compared to the special_dot().
Is g++-3.3.6 really that bad at optimizing or have I forgotten
Do I have to switch to gcc version 4.0, 4.1 or 4.2 to make g++ compile
the instantiation of general_code() to a code having similar/equal
performance compared to the one produced by special_code()?
Many thanks in advance,
Per Nordlöw
Swedish Defence Research Agency
- boost::array loop unrolling performance,
per . nordlow <=