swarm-support
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Swarm-Support] method dispatch costs


From: Marcus G. Daniels
Subject: [Swarm-Support] method dispatch costs
Date: Sat, 02 Dec 2006 20:27:14 -0700
User-agent: Thunderbird 1.5.0.8 (X11/20061107)

Hi,

For those of you that seek to make your Swarm models faster, here's the best case scenario for message dispatch costs in Objective C relative to avoiding messages entirely. This example (code attached) involves only one target object, which makes it easy on the CPU to successfully guess where branches will go. The main routine simply sends a billion messages to that object. Note that performance in general will degrade there are many live objects because branching targets change as agents, for example, move in and out of different Moore neighborhoods. I have not yet written a minimalistic test case that shows that degradation as a function of working set size and CPU architecture but I've seen it in real models using Intel VTune on x86 Linux. Consider the difference between a bucket brigade with 20 people vs. a single person...

Anyway, because various folks have talked about moving Swarm to the Apple runtime, I've started by comparing the GNU runtime (what Swarm uses now), with the Apple version. I'm using the same FSF GCC 4.3 compiler in both cases, just switching between -fgnu-runtime and -fnext-runtime.


       Method   IMP    NonInlineFunction InlineFunction  Macro
GNU     17.38    11.03  10.70             8.36            8.36
Apple   16.71    11.03  10.70             8.36            8.36
(real elapsed time in seconds on an idle 3Ghz MacPro running Tiger)

The `method' column shows the runtime of normal Objective C message sends. The IMP column shows the runtime when the method is predispatched and called via a function pointer. The NonInlineFunction and InlineFunction remove that indirection, in the former case calling a real function, and in the latter case allowing the compiler to physically insert the CPU instructions inline to the caller. Finally, Macro simply confirms that the compiler did indeed do the inlining by showing the speed of a functionally-equivalent macro. There's no reason why anything but the first column should change in this experiment, but as a sanity check I ran each configuration for each runtime compiler configuration. I was a bit surprised that the Apple runtime didn't do better, given does more to cache dispatching work. So, the best case logical cost of either runtime's dispatch costs aren't that bad compared the the theoretical best possible code without any. A factor of two.

Marcus

P.S. the Apple compiler does somewhat better than the FSF compiler, but that improvement is specific to Objective C (in non Objective C code the FSF compiler does better). The Apple compiler for method dispatch did it in 15 seconds. [not shown in table]

#include <stdlib.h>
#include <sys/time.h>
#include <stdio.h>
#include <objc/objc.h>
#include <objc/Object.h>

IMP msg1Imp;
SEL sel;

unsigned count;

#if 1
#define INC count++
#else
#define INC obj->count++
#endif

@interface Test: Object
{
  unsigned count;
}
- (unsigned)getCount;
- init;
@end

@implementation Test
static inline 
void func (Test *obj, SEL sel)
{
  INC;
}

- (unsigned)getCount
{
  return count;
}
- init
{
  count = 0;
  return [super init];
}
@end

@interface Test1: Test
- (void)msg;
@end
 
@implementation Test1
- (void)msg
{
   func (self, sel);
}
@end

@interface Test2: Test
- (void)msg;
@end

@implementation Test2
- (void)msg
{
   func (self, sel);
}
@end

id obj1;
id obj2;

id objs[2];

void
doTest ()
{
  unsigned i;
  struct timeval tv;

  gettimeofday (&tv, NULL);
  double start = (double) tv.tv_sec * 1.0e6 + (double) tv.tv_usec;
  for (i = 0; i < 1e9; i++)
#if 0
    [obj1 msg];
#elif 0
    msg1Imp (obj1, sel);
#elif 0
    func (obj1, sel);
#elif 1
    INC;
#endif
  gettimeofday (&tv, NULL);
  double end = (double) tv.tv_sec * 1.0e6 + (double) tv.tv_usec;

  printf ("%f\n", (end - start) / 1e6);
}

int
main (int argc, const char **argv)
{
  unsigned i;
  
  obj1 = [[Test1 alloc] init];
  obj2 = [[Test2 alloc] init];

  objs[0] = obj1;
  objs[1] = obj2;
 
  sel = @selector (msg);
  msg1Imp = [obj1 methodFor: sel];

  count = 0;
  doTest ();

  printf ("count: %u\n", count);

  [objs[0] free];
  [objs[1] free];
}

reply via email to

[Prev in Thread] Current Thread [Next in Thread]