freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [pooma-dev] timers and performance measurement under Linux


From: James Crotinger
Subject: RE: [pooma-dev] timers and performance measurement under Linux
Date: Mon, 6 Aug 2001 12:47:36 -0600

 
-----Original Message-----
From: Julian C. Cummings [mailto:address@hidden
Sent: Monday, August 06, 2001 12:21 PM
To: James Crotinger; address@hidden
Subject: RE: [pooma-dev] timers and performance measurement under Linux
Julian: ---------------------------------------------------
The gettimeofday() function is probably the best thing to use for wallclock
time measurement.  This is what we used in the old Timer class in Pooma r1.
I haven't looked at your check-in yet, but hopefully you remembered to check
for overflow in the microseconds counter and increment the seconds counter
accordingly.  Other than that, I remember that code as being pretty simple.   
-----------------------------------------------------------
 
I just did:
 
  return tv.tv_sec + 1.e-6 * tv.tv_usec;
 
This mirrors what we are doing with clock_gettime. My interpretation of gettimeofday is that tv_usec should always be less than 1e6 - it is supposed to return the number of seconds and microseconds since 12:00 am Jan 1, 1970. I checked this under Linux - tv_usec resets to zero everytime tv_sec is increased. So I don't see a reason to put our own (% 1000000) after it, and indeed if it were over 1000000 I'm not even sure how I'd interpret tv_sec.
 
Julian: ---------------------------------------------------
As for your comments on the PIII performance, I think what you are seeing
is correct.  The out-of-cache performance is not very good.  You will see
closer to optimal performance only when the problem size is in-cache, and
the caches are much smaller than what we were used to on the SGI boxes.
With an optimized C code kernel, you should be able to see the cache effect
and stronger flops numbers for small problem sizes.  (But of course, it gets
harder to measure accurately, too.)  I'm not aware of any profiling tools from
KAI, so I think prof/gprof is all there is, unless you know how to access Pentium
hardware counters. 
-----------------------------------------------------------
 
Oh, this number is definitely memory bandwidth limited - there are three to four loads and two stores every trip through the loop, which does four flops (two multiplies and two adds). I get a peak C performance of about 390 MFlops for N = 60 or so. The peak POOMA II Brick performance is only 115 at a slightly higher N and then it drops off very rapidly to about 30.
 
I tried gprof with "KCC -pg" generated code this morning, and gprof crashed after about 10 minutes of crunching on the output of a run. Has anyone else out there seen this? I'm going to try compiling with gcc, but I'm not sure it generates good enough code for me to trust the profile results to guide me to the right optimizations.
 
  Jim
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]