discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: objective-c: how slow ?


From: Malmberg
Subject: Re: objective-c: how slow ?
Date: Wed, 05 Sep 2001 14:53:55 +0200

OK, I've done a bunch of benchmarking and testing and have found some
interesting results. In all cases, the method/function called ignores
its arguments and returns an integer. Function prologue and epilogue is
included. I've tested by running a loop of calls 100,000,000 times on a
PII 400mhz with no load. Most of the time I've taken the assembly
generated by gcc and fixed the worst issues before testing (for whatever
reason, it likes to move the %esp value around for no good reason). I'm
using gcc 2.95.2, but that shouldn't matter.

So, in no particular order (speed in cycles/iteration, not necessarily
an integer due to overhead and pairing issues):

direct call, zero args                   8.12
direct call, one arg (self)              9.20
direct call, two args (self+selector)   11.04

c++ virtual call, one arg (self)        10.52
c++ virtual call, two args (self+dummy) 12.08

all these calls are with two args (self+selector):

old obj-c lookup/call                   42.72
new obj-c lookup/call (like the patch)  21.68
even more optimized obj-c               18.36
really optimized obj-c (not safe)       17.12

dynamic inline cache (in c) hit         13.08
call using IMP                          11.48

'outline' cache, style 1:
level 1 hit                             13.92
level 1 hit, no nil test                12.60
level 1 miss (double nil test)          24.60
level 1 miss (proper)                   23.68

level 2 hit                             15.36
level 2 miss (double nil test)          25.12

level 3 hit                             18.08
level 3 miss (double nil test)          28.04

'outline' cache, style 2:
level 1 hit                             13.40
level 1 miss (double nil test)          24.16

level 2 hit                             14.96
level 2 miss (proper)                   25.12

inline cache:
level 1 hit                             ~13


Now some comments:

Inline caches are a bad idea because:
1. They would make the code larger.
2. They're hard to do in a thread-safe way without locking (probably
impossible).
3. Modifying the code means shared libraries can't be shared between
programs. This is a problem with the other cache-systems as well, but
that could be solved (at the loss of speed) by an extra level of
indirection.

Even inlining the nil test didn't help performance (which is a bit
surprising). Having gcc load the class for the object and placing it in
a register before calling improves performance slightly, though.

With 'outline' caches, style 1, I'd create short stubs like:
   movl 4(%%esp),%%ecx          // get object from stack
   testl %%ecx,%%ecx            // check nil receiver
   jz _objc_nil_method
   movl (%%ecx),%%eax           // load class
   cmpl some_class,%%eax
   jnz objc_msg_lookup_call_no_nil_test
   jmp the_real_method

and patch the calls to these functions. The stubs would be packed
together in some memory allocated by the runtime. This could be made
thread safe without locking since jmp and call destinations can be
updated atomically with one movl (at least I think so).

Also, these stubs can be changed dynamically. If one of the stubs sees a
lot of misses going to a particular class, the 'jnz objc_...' could be
patched to jump to the cmpl of another stub. The jmp to the function
could also be patched if the implementation changes. However, it
wouldn't be possible to change the class for a stub.

Note that in most of my testing of the caches, misses were sent to the
normal lookup/call function, so loading the class and checking for a nil
receiver is done twice, which seems to cost 1-1.5 cycles.

'Outline' caches, style 2, had stubs like:
   movl 4(%%esp),%%ecx
   testl %%ecx,%%ecx
   jz _objc_nil_method
   movl (%%ecx),%%eax
   cmpl some_class_1,%%eax
   jz the_real_method_1
   cmpl some_class_2,%%eax
   jz the_real_method_2
// etc
   jmp objc_msg_lookup_call_no_nil_test

This is slightly faster than style 1, but you can't change the chaining
of compares.

Since it turned out that the normal lookup can be optimized down to just
~7.5 cycles slower than a direct c call (~6 cycles might be possible), I
doubt (automatic) caching will ever help. Even in the best safe cache,
the third level is almost as slow as the normal lookup. And these were
static caches, keeping stats over method calls and updating calls would
slow it down even more.

I'll get to work on making the optimized lookup work properly in all
cases, which should be fast enough. 7-8 cycles slower than a c++ call
isn't bad considering the capabilities and safe nil receiver handling.

- Alexander Malmberg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]