[Swarm-Support] method dispatch costs

swarm-support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Swarm-Support] method dispatch costs

From:	Marcus G. Daniels
Subject:	[Swarm-Support] method dispatch costs
Date:	Sat, 02 Dec 2006 20:27:14 -0700
User-agent:	Thunderbird 1.5.0.8 (X11/20061107)

Hi,

For those of you that seek to make your Swarm models faster, here's thebest case scenario for message dispatch costs in Objective C relative toavoiding messages entirely.This example (code attached) involves only one target object, whichmakes it easy on the CPU to successfully guess where branches will go.The main routine simply sends a billion messages to that object.Note that performance in general will degrade there are many liveobjects because branching targets change as agents, for example, movein and out of different Moore neighborhoods. I have not yet written aminimalistic test case that shows that degradation as a function ofworking set size and CPU architecture but I've seen it in real modelsusing Intel VTune on x86 Linux. Consider the difference between abucket brigade with 20 people vs. a single person...

Anyway, because various folks have talked about moving Swarm to theApple runtime, I've started by comparing the GNU runtime (what Swarmuses now), with the Apple version. I'm using the same FSF GCC 4.3compiler in both cases, just switching between -fgnu-runtime and-fnext-runtime.



       Method   IMP    NonInlineFunction InlineFunction  Macro
GNU     17.38    11.03  10.70             8.36            8.36
Apple   16.71    11.03  10.70             8.36            8.36
(real elapsed time in seconds on an idle 3Ghz MacPro running Tiger)

The `method' column shows the runtime of normal Objective C messagesends. The IMP column shows the runtime when the method ispredispatched and called via a function pointer. The NonInlineFunctionand InlineFunction remove that indirection, in the former case calling areal function, and in the latter case allowing the compiler tophysically insert the CPU instructions inline to the caller. Finally,Macro simply confirms that the compiler did indeed do the inlining byshowing the speed of a functionally-equivalent macro. There's noreason why anything but the first column should change in thisexperiment, but as a sanity check I ran each configuration for eachruntime compiler configuration. I was a bit surprised that the Appleruntime didn't do better, given does more to cache dispatching work.So, the best case logical cost of either runtime's dispatch costs aren'tthat bad compared the the theoretical best possible code without any.A factor of two.


Marcus

P.S. the Apple compiler does somewhat better than the FSF compiler, butthat improvement is specific to Objective C (in non Objective C code theFSF compiler does better). The Apple compiler for method dispatch didit in 15 seconds. [not shown in table]

#include <stdlib.h>
#include <sys/time.h>
#include <stdio.h>
#include <objc/objc.h>
#include <objc/Object.h>

IMP msg1Imp;
SEL sel;

unsigned count;

#if 1
#define INC count++
#else
#define INC obj->count++
#endif

@interface Test: Object
{
  unsigned count;
}
- (unsigned)getCount;
- init;
@end

@implementation Test
static inline 
void func (Test *obj, SEL sel)
{
  INC;
}

- (unsigned)getCount
{
  return count;
}
- init
{
  count = 0;
  return [super init];
}
@end

@interface Test1: Test
- (void)msg;
@end
 
@implementation Test1
- (void)msg
{
   func (self, sel);
}
@end

@interface Test2: Test
- (void)msg;
@end

@implementation Test2
- (void)msg
{
   func (self, sel);
}
@end

id obj1;
id obj2;

id objs[2];

void
doTest ()
{
  unsigned i;
  struct timeval tv;

  gettimeofday (&tv, NULL);
  double start = (double) tv.tv_sec * 1.0e6 + (double) tv.tv_usec;
  for (i = 0; i < 1e9; i++)
#if 0
    [obj1 msg];
#elif 0
    msg1Imp (obj1, sel);
#elif 0
    func (obj1, sel);
#elif 1
    INC;
#endif
  gettimeofday (&tv, NULL);
  double end = (double) tv.tv_sec * 1.0e6 + (double) tv.tv_usec;

  printf ("%f\n", (end - start) / 1e6);
}

int
main (int argc, const char **argv)
{
  unsigned i;
  
  obj1 = [[Test1 alloc] init];
  obj2 = [[Test2 alloc] init];

  objs[0] = obj1;
  objs[1] = obj2;
 
  sel = @selector (msg);
  msg1Imp = [obj1 methodFor: sel];

  count = 0;
  doTest ();

  printf ("count: %u\n", count);

  [objs[0] free];
  [objs[1] free];
}

[Prev in Thread]

Current Thread

[Next in Thread]

[Swarm-Support] method dispatch costs, Marcus G. Daniels <=

Index(es):
- Date
- Thread