freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pooma-dev] KCC versus icc (and gcc)


From: Tarjei Knapstad
Subject: Re: [pooma-dev] KCC versus icc (and gcc)
Date: 04 Mar 2003 11:30:55 +0100

On Wed, 2003-02-26 at 20:27, Richard Guenther wrote:
> Hi!
> 
> I remember problems with the inliner, i.e. it refused to inline
> some of the expression template machinery. You might want to search
> for an option letting you tune the inlining behavior or try profile
> directed optimizations. With standard -O3 icc is not always faster
> than gcc3.2.2 with -O3.
> 
Just thought I'd add a bit to that. A while back me and some others
constructed some code to try to measure the performance hit of
dynamic_cast with the following code (needs the boost libraries):

========== BEGIN CODE ================
#include <iostream>
#include <boost/timer.hpp>
using namespace std;

const int num=10000000;

class TestBase
{
public:
    virtual ~TestBase() {}
    virtual void f() {}
    void f2() {}
};

class Test1 : public TestBase
{
public:
    virtual ~Test1() {}
    virtual void f() {}
    void f2() {}
};

// Ensure that it doesn't optimise away the reading of it in the loops
volatile TestBase* testBasePtr = new Test1(); 

int main()
{
    boost::timer t1;

    for(unsigned int i = 0; i != num; ++i)
    {
        Test1* test1 = const_cast<Test1*>(dynamic_cast<volatile
Test1*>(testBasePtr));
        if (test1)
        {
            test1 -> f2();
        }
    }
    cout << "Elapsed t1 " << t1.elapsed() << " " << endl;

    boost::timer t2;

    for(unsigned int i = 0; i != num; ++i)
    {
        Test1* test1 = const_cast<Test1*>(static_cast<volatile
Test1*>(testBasePtr));
        if (test1)
        {
            test1 -> f2();
        }
    }
    cout << "Elapsed t2 " << t2.elapsed() << " " << endl;
    return 0;
}
=========== END CODE ==========

We tried running this on both gcc:

Elapsed t1 0.52 
Elapsed t2 0.01 

and Intel 7:

Elapsed t1 4.03 
Elapsed t2 0.12 

with optimizations turned on and the results are quite staggering.
Similar differences are found when testing boost.lexical_cast on the
different compilers.

There are quite a lot of advanced compiler optimizations in the Intel
compiler though, which I haven't had too much time to play around with
(I'm using gcc and have only played around with icc).

I would also be interested in your findings if you can make icc generate
code that is more in the vicinity of Kcc's performance.

Regards,
--
Tarjei

reply via email to

[Prev in Thread] Current Thread [Next in Thread]