Benchmarks are a lot of trouble. I spend most
of my time these days optimizing video processing code using SSE
instructions. The cache is sooo important. If you ever get a pentium 4, you should get the VTune product from
Intel. It is worth the money. There is a free trial period which is
often long enough to solve your problem
Anyway, back to reality and your pentium II and the
mysterious speed of mzscheme. To get another point of comparison, you
should make an empty loop in C. Also keep in mind, that empty loops are
bad for benchmarks, because they have so much opportunity for
optimization. Some compilers are smart enough to completely remove an
empty loop. It is better to make a call to a known external
function inside the loop. This external function should be supplied as an
object module, so the compiler can't optimize it, and should have a fixed
execution time.
Then you can get some idea of your loop overhead,
by running the loop through various numbers of iterations, and plotting the
points, and taking the slope of the line.
Everything needs to be repeated. A single
long interrupt can destroy a measurement. I generally run about 100 outer
iterations of my benchmarks, and keep the median. It is important not to
use the mean, because if you plot the distibution, you will see that it has a
very long tail.
Use the Pentium "read time stamp counter"
instruction if you have it. This is the most accurate time available on
most systems.
Regards,
Bert Douglas
----- Original Message -----
Sent: Monday, November 24, 2003 8:04
AM
Subject: Re: [Chicken-users]
benchmarks
Am Mon, 24 Nov 2003 14:18:01 +0100 hat address@hidden
<address@hidden>
geschrieben:
> There are lies, damned lies and benchmarks, so
... > > I was trying to determine the function call overhead, so I
wanted to > measure how much it takes to > call a million times a
do-nothing function. The answer is ~6 seconds in > the interpreter
and > 2 seconds in the compiler (on my old PII machine). So, compilation
gives > a 3x speedup. > Now, what surprises me is that
interpreted mzscheme is as fast a > compiled chicken! > How can
it be? Here is my code (yes, I know the while could be coded > better,
but it does > not matter in the issue of why mzscheme is so fast (or
chicken so slow)): > > > ; 6.539s interpreter > ;
2.059s compiler > ; 2.2s mzscheme > > (define-macro (while c
. do) > `(let loop () > (if
(not ,c) 'done (begin ,@do (loop))))) > > (define (f) #f) ;; do
nothing function > > (define N 1000000) >
(time > (define i 0) > (while (< i
N) > (f) > (set! i (add1
i)))) > > > It could be that I am doing something stupid, it
is my first benchmark! > N.B. the -higienic option is on.
Have
you used any optimization options? Here are a few timing results on my
machine (WinXP, 1400 Athlon):
c:\home>type bench.scm type
bench.scm (define-macro (while c . do) `(let loop
() (if (not ,c) 'done (begin ,@do
(loop)))))
(define (f) #f) ;; do nothing function
(define N
1000000) (time (define i 0) (while (< i
N) (f) (set! i (add1
i))))
*** Interpreted:
c:\home>csi -script bench.scm csi
-script bench.scm 2.143 seconds
elapsed 0.11 seconds in (major)
GC 43219 mutations 116
minor GCs 28 major GCs
*** No
optimizations:
c:\home>csc bench.scm csc
bench.scm bench.c Bibliothek bench.lib und Objekt
bench.exp wird
erstellt bench c:\home>bench bench
0.61 seconds elapsed 0 seconds
in (major) GC 0
mutations 1387 minor
GCs 0 major GCs
***
Assume standard procedures are not redefined, and with
traceback info switched off:
c:\home>csc bench.scm -O2 -d0 csc
bench.scm -O2 -d0 bench.c Bibliothek bench.lib und
Objekt bench.exp wird
erstellt
c:\home>bench bench
0.23 seconds elapsed 0 seconds
in (major) GC 0
mutations 534 minor
GCs 0 major GCs
*** As
-O2, with safety checks switched off, and in "block" mode
(i.e. globals are inaccessible from outside):
c:\home>csc bench.scm
-O3 -d0 -b csc bench.scm -O3 -d0 -b bench.c
Bibliothek bench.lib und Objekt bench.exp wird
erstellt
c:\home>bench bench 0.15
seconds elapsed 0 seconds in
(major) GC 0
mutations 320 minor
GCs 0 major GCs
*** As
-O3 -d0, fixnum arithmetic, interrupts switched off:
c:\home>csc
bench.scm -Ob csc bench.scm -Ob bench.c Bibliothek
bench.lib und Objekt bench.exp wird
erstellt
c:\home>bench bench 0.02
seconds elapsed 0 seconds in
(major) GC 0
mutations 0 minor
GCs 0 major GCs
I don't
think this is that bad...
> > Generally speaking, how does
chicken performance compare with other > Scheme >
implementations? (I expect there is a document somewhere
...) >
There were a few comparisons. I even had some graphical
tables, but keeping the information up-to-date was getting tedious. The
code generated by Chicken is in most cases a good deal faster than all
byte-code interpreter based implementations, depending on optimizations.
Compared with other Scheme->C compilers, Chicken and PLTs mzc are
roughly in the same ballpark. Chicken appears to be faster than Gambit on
floating-point heavy code. Once you start using first-class continuations
heavily or are doing allocation- intensive stuff, Chicken can also keep
up with Bigloo and other high-performance implementations. Note that
libraries (like string-processing) are not so dependent on the
compiler. For example, Chicken's string-processing code is not
particularly fast, since it's written in Scheme and hasn't been tuned
that
much.
cheers, felix
_______________________________________________ Chicken-users
mailing list address@hidden http://mail.nongnu.org/mailman/listinfo/chicken-users
|