[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Swarm-Support] method dispatch costs
From: |
Marcus G. Daniels |
Subject: |
[Swarm-Support] method dispatch costs |
Date: |
Sat, 02 Dec 2006 20:27:14 -0700 |
User-agent: |
Thunderbird 1.5.0.8 (X11/20061107) |
Hi,
For those of you that seek to make your Swarm models faster, here's the
best case scenario for message dispatch costs in Objective C relative to
avoiding messages entirely.
This example (code attached) involves only one target object, which
makes it easy on the CPU to successfully guess where branches will go.
The main routine simply sends a billion messages to that object.
Note that performance in general will degrade there are many live
objects because branching targets change as agents, for example, move
in and out of different Moore neighborhoods. I have not yet written a
minimalistic test case that shows that degradation as a function of
working set size and CPU architecture but I've seen it in real models
using Intel VTune on x86 Linux. Consider the difference between a
bucket brigade with 20 people vs. a single person...
Anyway, because various folks have talked about moving Swarm to the
Apple runtime, I've started by comparing the GNU runtime (what Swarm
uses now), with the Apple version. I'm using the same FSF GCC 4.3
compiler in both cases, just switching between -fgnu-runtime and
-fnext-runtime.
Method IMP NonInlineFunction InlineFunction Macro
GNU 17.38 11.03 10.70 8.36 8.36
Apple 16.71 11.03 10.70 8.36 8.36
(real elapsed time in seconds on an idle 3Ghz MacPro running Tiger)
The `method' column shows the runtime of normal Objective C message
sends. The IMP column shows the runtime when the method is
predispatched and called via a function pointer. The NonInlineFunction
and InlineFunction remove that indirection, in the former case calling a
real function, and in the latter case allowing the compiler to
physically insert the CPU instructions inline to the caller. Finally,
Macro simply confirms that the compiler did indeed do the inlining by
showing the speed of a functionally-equivalent macro. There's no
reason why anything but the first column should change in this
experiment, but as a sanity check I ran each configuration for each
runtime compiler configuration. I was a bit surprised that the Apple
runtime didn't do better, given does more to cache dispatching work.
So, the best case logical cost of either runtime's dispatch costs aren't
that bad compared the the theoretical best possible code without any.
A factor of two.
Marcus
P.S. the Apple compiler does somewhat better than the FSF compiler, but
that improvement is specific to Objective C (in non Objective C code the
FSF compiler does better). The Apple compiler for method dispatch did
it in 15 seconds. [not shown in table]
#include <stdlib.h>
#include <sys/time.h>
#include <stdio.h>
#include <objc/objc.h>
#include <objc/Object.h>
IMP msg1Imp;
SEL sel;
unsigned count;
#if 1
#define INC count++
#else
#define INC obj->count++
#endif
@interface Test: Object
{
unsigned count;
}
- (unsigned)getCount;
- init;
@end
@implementation Test
static inline
void func (Test *obj, SEL sel)
{
INC;
}
- (unsigned)getCount
{
return count;
}
- init
{
count = 0;
return [super init];
}
@end
@interface Test1: Test
- (void)msg;
@end
@implementation Test1
- (void)msg
{
func (self, sel);
}
@end
@interface Test2: Test
- (void)msg;
@end
@implementation Test2
- (void)msg
{
func (self, sel);
}
@end
id obj1;
id obj2;
id objs[2];
void
doTest ()
{
unsigned i;
struct timeval tv;
gettimeofday (&tv, NULL);
double start = (double) tv.tv_sec * 1.0e6 + (double) tv.tv_usec;
for (i = 0; i < 1e9; i++)
#if 0
[obj1 msg];
#elif 0
msg1Imp (obj1, sel);
#elif 0
func (obj1, sel);
#elif 1
INC;
#endif
gettimeofday (&tv, NULL);
double end = (double) tv.tv_sec * 1.0e6 + (double) tv.tv_usec;
printf ("%f\n", (end - start) / 1e6);
}
int
main (int argc, const char **argv)
{
unsigned i;
obj1 = [[Test1 alloc] init];
obj2 = [[Test2 alloc] init];
objs[0] = obj1;
objs[1] = obj2;
sel = @selector (msg);
msg1Imp = [obj1 methodFor: sel];
count = 0;
doTest ();
printf ("count: %u\n", count);
[objs[0] free];
[objs[1] free];
}
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Swarm-Support] method dispatch costs,
Marcus G. Daniels <=