chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Re: [gambit-list] Help With Memory


From: Marc Feeley
Subject: [Chicken-users] Re: [gambit-list] Help With Memory
Date: Fri, 26 Sep 2008 11:32:24 -0400


On 26-Sep-08, at 10:41 AM, Alex Shinn wrote:

Hi,

Marc Feeley <address@hidden> writes:

The point I am trying to make is that in a Scheme to C compiler
continuations can be implemented in other ways than Cheney on the MTA
to get a system with good performance for call/cc. Whether one system
is a few percent faster than the other on these benchmarks is quite
possibly due to other factors unrelated to the implementation of
continuations.

Indeed, those benchmarks are both highly influenced by the
speed of generic arithmetic, which Chicken is slow at.  If
you set the options for both implementations to use unsafe,
fixnum-only arithmetic, the computation amounts to
practically nothing, and all you're comparing is the speed
of call/cc.  In this case I find Chicken is roughly 1.4x
faster for ctak, and 2x faster for fibc.

You are comparing Chicken to Chicken using different modes right? When Chicken and Gambit are benchmarked in "r6rs-fixflo-unsafe" mode (which combines declarations for standard-bindings, fixnum specific operations and unsafe execution (no type checks)) the results I get are:

  ctak: Chicken is 1.03 times faster than Gambit
  fibc: Gambit is 1.01 times faster than Chicken

Given all the indeterminism in the processors (cache alignment, cache hits, etc) the execution times should be considered equal.

Chicken is a simple compiler with relatively few
optimizations.  The fact that it can nonetheless outperform
Gambit (which is otherwise faster in general) on these
benchmarks suggests that Cheney on the MTA gives you very
fast continuations.

The conclusion from my benchmarks is quite different. Chicken does not outperform Gambit on these benchmarks. There is so little other stuff happening than call/cc in these benchmarks that it would appear that the performance of call/cc in Chicken and Gambit is essentially the same (to within a few percent).

Another point I want to make is that Cheney on the MTA give you "free" call/cc only after paying a premium on other things, namely stack- like
behaving function calls and tail-calls.

Sure, to be clear I'm not claiming that Cheney on the MTA is
a superior architecture, just that it has fast
continuations.  Specifically, in answer to the original
question, you can't get notably faster code with manual CPS
than with call/cc in Chicken.  But as you say, it comes with
trade-offs, and I wouldn't be so rude as to recommend people
use Chicken on the Gambit list :)

I do think that with a good optimizing compiler, a lot of
the differences in strategies can be optimized away though.
For example Chicken already contracts self tail-calls so
that simple loops use goto - they're not "stack-like" - and
many more optimizations can help close the gap.

Only time will tell if all the optimizations required to match Gambit's current performance will be added to Chicken before the performance of Gambit is improved with new optimizations of its own!

But even if self tail-calls are handled better, stack-like tail-calls (in the Scheme source code) will suffer with Cheney on the MTA because you will not get stack-like behavior in the generated C code (at least in the general case). This will translate into additional GC pressure, a lower hit ratio for the caches, and a lower branch- prediction performance (note that the last point is shared with Gambit but not the first two). As you can see I am pessimistic about the performance that can be obtained with a Cheney on the MTA approach.

Marc





reply via email to

[Prev in Thread] Current Thread [Next in Thread]