help-gplusplus
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: boost::array loop unrolling performance


From: Bernd Strieder
Subject: Re: boost::array loop unrolling performance
Date: Tue, 04 Jul 2006 15:10:45 +0200
User-agent: KNode/0.10.2

Hello,

per.nordlow@gmail.com wrote:

Replacing

> #include "../Timer.hpp"

which I don't have by:


#include <sys/time.h>
#include <iostream>

class Timer
{
public:
    Timer()
        {
            reset();
        }
    void reset() {gettimeofday(&tv, 0);}
    void read()
        {
            struct timeval tvnow;
            gettimeofday(&tvnow, 0);
            usecs = 1000000*(tvnow.tv_sec - tv.tv_sec) + (tvnow.tv_usec -
tv.tv_usec);
        }
    operator long long()
        {
            return usecs;
        }
private:
    struct timeval tv;
    long long usecs;
};

Using gcc-4.0.3:

> g++ -v
gcc-Version 4.0.3
> g++ -O0 -o bala bala.cpp
> ./bala
general: 765409
special: 612126
Checksums are equal. OK
> g++ -O1 -o bala bala.cpp
> ./bala
general: 117713
special: 19934
Checksums are equal. OK
> g++ -O2 -o bala bala.cpp
> ./bala
general: 117883
special: 19896
Checksums are equal. OK
> g++ -O3 -o bala bala.cpp
> ./bala
general: 117617
special: 20068
Checksums are equal. OK

> g++ -v
> g++ -O3 -funroll-loops -o bala bala.cpp
> ./bala
general: 1398
special: 1386
Checksums are equal. OK

Using gcc-4.1.1:

> g++ -v
gcc-Version 4.1.1
> g++ -O0 -o bala bala.cpp
> ./bala
general: 719761
special: 613356
Checksums are equal. OK
> g++ -O1 -o bala bala.cpp
> ./bala
general: 118248
special: 3
Checksums are equal. OK
> g++ -O2 -o bala bala.cpp
> ./bala
general: 102042
special: 3
Checksums are equal. OK
> g++ -O3 -o bala bala.cpp
> ./bala
general: 73090
special: 3
Checksums are equal. OK

> g++ -O1 -funroll-loops -o bala bala.cpp
> ./bala
general: 15717
special: 3
Checksums are equal. OK
> g++ -O2 -funroll-loops -o bala bala.cpp
> ./bala
general: 9796
special: 4
Checksums are equal. OK
> g++ -O3 -funroll-loops -o bala bala.cpp
> ./bala
general: 9170
special: 4
Checksums are equal. OK

>   general: 60.965ms
>   special: 902us
>   Checksums are equal. OK
> 
> As we can see the performance of the general_dot() is terrible (~60
> times slower) compared to the special_dot().
> 
> Do I have to switch to gcc version 4.0, 4.1 or 4.2 to make g++ compile
> the instantiation of general_code() to a code having similar/equal
> performance compared to the one produced by special_code()?

This benchmark is possibly not strong enough. It is obvious that
starting with -O1 gcc-4.1.1 manages to recognize that special_dot is
called with the same arguments, repeatedly. The whole outer loop is
optimized away. This could be done with the general version, too.
Possibly the optimizer gets stuck halfways removing the loop completely
in some of the cases especially with gcc-4.0.3 and -O3 -funroll-loops.
You should definitly use a benchmark without calculations the compiler
can recognize as superfluous.

AFAIK optimization is the main reason for the huge changes in gcc during
the past years and the years to come. Perhaps it could help to try
other loop-related optimization options.

Bernd Strieder



reply via email to

[Prev in Thread] Current Thread [Next in Thread]