discuss-gnustep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: objective-c: how slow ?


From: Marko Mikulicic
Subject: Re: objective-c: how slow ?
Date: Sat, 01 Sep 2001 18:58:14 -0400
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.3) Gecko/20010801

Erik M. Buck wrote:
you would modify the compiler so that instead of compiling



As you know, the above is not thread safe.  Furthermore, the above is likely
SLOWER.  The variable assignments require memory writes.  The use of static
variables probably means the memory will not be in the cache.

Yes, you are right. This is because I prefer "self-modifying" code
(in Self not really self-modifying, because PIC updates are deferred).
Also using a writable text segment you could "fix" method calls as direct calls instead indirect ones, which can aid the cpu's prefetch logic.
  The
comparison and branch is almost as expensive as the function call, and for
the common case when the IMP for @selector(message) is in receiver's method
cache, you are getting effectively the same optimization.  I doubt there is
ANY benefit to the above code. It probably will make things slower.

I tried it on a simple case and got these results
- 11.31 secs for full lookup
- 5.20  secs for simple inline cache (Nicola's code)
- 4.20  for plain-c
the body is a tight loop of 1e8 iteration; the method called simply return
the argument incremented.
I know it's not a benchmark, it's far too simplified because it doesn't take
account of the presence of the IC in the data cache.

 It is
not thread safe.  It requires more memory and more code.  It abuses
processor memory cache.  It substantially duplicates what is already
happening in objc_msg_send.

various levels of cache have their purposes. Duplicating is not always bad.

One optimization would be to inline objc_msg_lookup()
Another optimization would be to use jmp rather than a function call to
execute the IMP.  The NeXTstep runtime diddles the stack and jumps to the
IMP.

This would be good, expecially for methods with many argument (or massive ones).

  It is slightly faster, but it is assembly language and therefore less
portable.


Finally, as you point out, when optimization is critical, the programmer can
call the IMP directly.

You must know that the target will not change. To handle correctly
 an object of the wrong type inside a collection you will end up to
implement a inline cache by hand however. This is simple to manage
when the calls are directly in the tight loop

I don't want that this thread becomes something like
the everlasting fight between GC and not GC, but let me make a
parallel with it:
 The C/C++/whatever programmer argues that manual memory allocation is best
because you have the control over your memory management and GC is sloow,
undeterministic, ... but when the need arises he introduces something called
"smartpointer" which is nothing else than a reference counting GC, which is
probably (I don't want dispute here about that) the worst GC around these days.

In the cases you can use directly IMP then use them.
But if you can't (like my situation with gstep-db) you still could code
inline caches by hand, but then you'd be limited by what the c compiler
can produce (it doesn't make sense to code it in asm). But the compiler
itself could direcly generate ICs or PICs (with the appropriate flags of course) , and it will generate them better that you as a end user could, because it
will be optimized for a given architecture (something you must not care about).

Objc is at the border, you don't notice the performance degradation because
you can cross it to the C side; when C is too tight you can cross it again to the smalltalk side. So you can have reasonable performance and flexibility out
of both worlds but at the price of living in a hybrid framework, where integers
are not objects, nil has nil behavior etc.

I agree that maybe the small perf. gain is not worth the effort, unless you change coding style, but then it is not OpenStep anymore.

Please read the Self performance related papers (http://www.sun.com/research/self/papers/papers.html) if you have time.
They managed to run pure OO language only twice slower than optimized C
(non strictly OO tasks) having bound-checking, GC, integer as objects (automatic multi precision on overflow), full debugging support (dynamic deoptimizations), multithreading, dynamic inheritance, and many other features
off topic. Many of these features are not interesting to objc but this is the
reason why Self was twice slower than C. I think objc could do better.

Marko




reply via email to

[Prev in Thread] Current Thread [Next in Thread]