[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gcc 3.4.3 performance problem illustrated
From: |
Kenneth Massey |
Subject: |
gcc 3.4.3 performance problem illustrated |
Date: |
Sat, 30 Apr 2005 16:02:38 -0400 |
User-agent: |
Mozilla Thunderbird 1.0.2 (X11/20050429) |
I was noticing significantly worse performance in some of my C++ codes compiled
with gcc 3.4.3
as compared to gcc 3.3.4. I have boiled it down into one relatively short code
that illustrates.
It seems to be an issue of excessive cache misses in certain pointer lookup
operations in gcc
3.4.3 binaries. BTW, are there any tools to actually count cache misses?
If anyone has a few minutes to compile and run the following code, I would be
interested in
knowing if you experience the same problems. I'm running AMD64 athlon 3200 with
1024KB cache. I
compiled with
g++ -O3 -Wall -march=k8
Compiled with gcc 3.3.4 average run time: 2.0 seconds
Compiled with gcc 3.4.3 average run time: 2.9 seconds
I've noticed even more dramatic differences in larger codes that actually do
something.
I would be interested in answering the following questions:
1) is this observed only on AMD64, or also x86 ?
2) how does gcc 4.0.0 do?
3) are there compiler options that would improve performance (none that I've
tried did)
4) what changed between gcc 3.3 and 3.4 to cause this?
If you have any spare time, I think this is an interesting example, and worth
the effort for
someone to figure out. I'm afraid my compiler expertise is not sufficient, so I
am asking for
some help. Thanks.
Code:
// run time is anywhere from 33 to 50 % longer when compiled with gcc 3.4.3
compared to 3.3.4
// compiled with g++ -O3 -Wall -march=k8 (same performance lag observed
with -O2)
//
// Objects are created in a heirarchy of classes.
// When referenced, it seems that the pointer lookups
// must cause more cache misses in gcc 3.4.3 binaries.
#include <stdio.h>
#include <vector>
class mytype_A {
public:
int id;
mytype_A():id(0) {}
};
class mytype_B {
public:
mytype_A* A;
mytype_B(mytype_A* p):A(p) {}
};
class mytype_C {
public:
mytype_B* B;
mytype_C(mytype_B* p):B(p) {}
};
class mytype_D {
public:
// mytype_C* C[2]; // less performance difference if we use simple
arrays
std::vector<mytype_C*> C;
int junk[3]; // affects performance (must cause cache misses)
public:
mytype_D(mytype_A* a0, mytype_A* a1) {
// C[0] = new mytype_C(new mytype_B(a0));
// C[1] = new mytype_C(new mytype_B(a0));
C.push_back(new mytype_C(new mytype_B(a0)));
C.push_back(new mytype_C(new mytype_B(a0)));
}
};
int main() {
int k = 5000; // run-time not linear in k
mytype_A* A[k];
mytype_D* D[k];
for (int i=0;i<=k;i++)
A[i] = new mytype_A();
for (int i=0;i<k;i++)
D[i] = new mytype_D(A[i],A[k-i]); // intentionally make some pointers
farther apart
clock_t before = clock();
int k0 = 0;
for (int i=0;i<k;i++) {
k0 = 0;
for (int j=0;j<k;j++) { // run through list of D's, and reference
pointers
mytype_D* d = D[j];
if (d->C[0]->B->A->id) k0++;
if (d->C[1]->B->A->id) k0++;
}
}
printf("%d\n",k0); // don't allow compiler to optimize away k0
printf("time: %f\n",(double)(clock()-before)/CLOCKS_PER_SEC);
return 0;
}
--
Kenneth Massey
http://www.masseyratings.com
- gcc 3.4.3 performance problem illustrated,
Kenneth Massey <=