freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [pooma-dev] timers and performance measurement under Linux


From: Julian C. Cummings
Subject: RE: [pooma-dev] timers and performance measurement under Linux
Date: Mon, 6 Aug 2001 12:42:26 -0700

timers and performance measurement under LinuxJim,

On the gettimeofday stuff: I may have been thinking of checking for overflow
in the internal storage of elapsed time in our old Timer class.  The Timer
class had start() and stop() functions, so you had to accumulate the elapsed
time between many starts and stops.  In this case, you will certainly need
to make adjustments as time is accumulated.  If you are just returning the
seconds and microseconds values from a single call to gettimeofday, I don't
think there is any issue with overflows.

Your performance results sound consistent with things I've seen in the past
with KCC on linux.  Pooma code seems to have a hard time reproducing the
high peak in on-node performance due to the cache effect.  I'm not sure why
that is.  I've never tried using gprof with KCC code, but it should work.
I'd
report it to KAI and see what they say.  I would think it would be worth
looking
at profiling data from gcc code as well, just to make sure there is nothing
crazy
going on.  Besides, I thought the performance gap had closed a fair bit
between
KCC and gcc in our last round of Pooma performance testing.  Is there still
a
really huge difference with gcc 3.0?

Julian C.

  -----Original Message-----
  From: James Crotinger [mailto:address@hidden
  Sent: Monday, August 06, 2001 11:48 AM
  To: 'address@hidden'; James Crotinger;
address@hidden
  Subject: RE: [pooma-dev] timers and performance measurement under Linux



    -----Original Message-----
    From: Julian C. Cummings [mailto:address@hidden
    Sent: Monday, August 06, 2001 12:21 PM
    To: James Crotinger; address@hidden
    Subject: RE: [pooma-dev] timers and performance measurement under Linux
    Julian: ---------------------------------------------------
    The gettimeofday() function is probably the best thing to use for
wallclock
    time measurement.  This is what we used in the old Timer class in Pooma
r1.
    I haven't looked at your check-in yet, but hopefully you remembered to
check
    for overflow in the microseconds counter and increment the seconds
counter
    accordingly.  Other than that, I remember that code as being pretty
simple.
    -----------------------------------------------------------

  I just did:

    return tv.tv_sec + 1.e-6 * tv.tv_usec;

  This mirrors what we are doing with clock_gettime. My interpretation of
gettimeofday is that tv_usec should always be less than 1e6 - it is supposed
to return the number of seconds and microseconds since 12:00 am Jan 1, 1970.
I checked this under Linux - tv_usec resets to zero everytime tv_sec is
increased. So I don't see a reason to put our own (% 1000000) after it, and
indeed if it were over 1000000 I'm not even sure how I'd interpret tv_sec.

    Julian: ---------------------------------------------------
    As for your comments on the PIII performance, I think what you are
seeing
    is correct.  The out-of-cache performance is not very good.  You will
see
    closer to optimal performance only when the problem size is in-cache,
and
    the caches are much smaller than what we were used to on the SGI boxes.
    With an optimized C code kernel, you should be able to see the cache
effect
    and stronger flops numbers for small problem sizes.  (But of course, it
gets
    harder to measure accurately, too.)  I'm not aware of any profiling
tools from
    KAI, so I think prof/gprof is all there is, unless you know how to
access Pentium
    hardware counters.
    -----------------------------------------------------------

  Oh, this number is definitely memory bandwidth limited - there are three
to four loads and two stores every trip through the loop, which does four
flops (two multiplies and two adds). I get a peak C performance of about 390
MFlops for N = 60 or so. The peak POOMA II Brick performance is only 115 at
a slightly higher N and then it drops off very rapidly to about 30.

  I tried gprof with "KCC -pg" generated code this morning, and gprof
crashed after about 10 minutes of crunching on the output of a run. Has
anyone else out there seen this? I'm going to try compiling with gcc, but
I'm not sure it generates good enough code for me to trust the profile
results to guide me to the right optimizations.

    Jim


reply via email to

[Prev in Thread] Current Thread [Next in Thread]