Re: [Chicken-users] benchmarks

From:

Bert Douglas

Subject:

Date:

Mon, 24 Nov 2003 09:46:23 -0600

Benchmarks are a lot of trouble. I spend most of my time these days optimizing video processing code using SSE instructions. The cache is sooo important. If you ever get a pentium 4, you should get the VTune product from Intel. It is worth the money. There is a free trial period which is often long enough to solve your problem

Anyway, back to reality and your pentium II and the mysterious speed of mzscheme. To get another point of comparison, you should make an empty loop in C. Also keep in mind, that empty loops are bad for benchmarks, because they have so much opportunity for optimization. Some compilers are smart enough to completely remove an empty loop. It is better to make a call to a known external function inside the loop. This external function should be supplied as an object module, so the compiler can't optimize it, and should have a fixed execution time.

Then you can get some idea of your loop overhead, by running the loop through various numbers of iterations, and plotting the points, and taking the slope of the line.

Everything needs to be repeated. A single long interrupt can destroy a measurement. I generally run about 100 outer iterations of my benchmarks, and keep the median. It is important not to use the mean, because if you plot the distibution, you will see that it has a very long tail.

Use the Pentium "read time stamp counter" instruction if you have it. This is the most accurate time available on most systems.

Regards,

Bert Douglas

----- Original Message -----

From: Felix Winkelmann

To: chicken-users

Sent: Monday, November 24, 2003 8:04 AM

Subject: Re: [Chicken-users] benchmarks

Am Mon, 24 Nov 2003 14:18:01 +0100 hat address@hidden
<address@hidden> geschrieben:

> There are lies, damned lies and benchmarks, so ...
>
> I was trying to determine the function call overhead, so I wanted to
> measure how much it takes to
> call a million times a do-nothing function. The answer is ~6 seconds in
> the interpreter and
> 2 seconds in the compiler (on my old PII machine). So, compilation gives
> a 3x speedup.
> Now, what surprises me is that interpreted mzscheme is as fast a
> compiled chicken!
> How can it be? Here is my code (yes, I know the while could be coded
> better, but it does
> not matter in the issue of why mzscheme is so fast (or chicken so slow)):
>
>
> ; 6.539s interpreter
> ; 2.059s compiler
> ; 2.2s mzscheme
>
> (define-macro (while c . do)
>   `(let loop ()
>     (if (not ,c) 'done (begin ,@do (loop)))))
>
> (define (f) #f) ;; do nothing function
>
> (define N 1000000)
> (time
> (define i 0)
> (while (< i N)
>    (f)
>    (set! i (add1 i))))
>
>
> It could be that I am doing something stupid, it is my first benchmark!
> N.B. the -higienic option is on.

Have you used any optimization options? Here are a few timing results
on my machine (WinXP, 1400 Athlon):

c:\home>type bench.scm
type bench.scm
(define-macro (while c . do)
   `(let loop ()
     (if (not ,c) 'done (begin ,@do (loop)))))

(define (f) #f) ;; do nothing function

(define N 1000000)
(time
(define i 0)
(while (< i N)
    (f)
    (set! i (add1 i))))

*** Interpreted:

c:\home>csi -script bench.scm
csi -script bench.scm
    2.143 seconds elapsed
     0.11 seconds in (major) GC
    43219 mutations
      116 minor GCs
       28 major GCs

*** No optimizations:

c:\home>csc bench.scm
csc bench.scm
bench.c
    Bibliothek bench.lib und Objekt bench.exp wird erstellt
bench
c:\home>bench
bench
     0.61 seconds elapsed
        0 seconds in (major) GC
        0 mutations
     1387 minor GCs
        0 major GCs

*** Assume standard procedures are not redefined, and with
   traceback info switched off:

c:\home>csc bench.scm -O2 -d0
csc bench.scm -O2 -d0
bench.c
    Bibliothek bench.lib und Objekt bench.exp wird erstellt

c:\home>bench
bench
      0.23 seconds elapsed
        0 seconds in (major) GC
        0 mutations
      534 minor GCs
        0 major GCs

*** As -O2, with safety checks switched off, and in "block" mode
   (i.e. globals are inaccessible from outside):

c:\home>csc bench.scm -O3 -d0 -b
csc bench.scm -O3 -d0 -b
bench.c
    Bibliothek bench.lib und Objekt bench.exp wird erstellt

c:\home>bench
bench
     0.15 seconds elapsed
        0 seconds in (major) GC
        0 mutations
      320 minor GCs
        0 major GCs

*** As -O3 -d0, fixnum arithmetic, interrupts switched off:

c:\home>csc bench.scm -Ob
csc bench.scm -Ob
bench.c
    Bibliothek bench.lib und Objekt bench.exp wird erstellt

c:\home>bench
bench
     0.02 seconds elapsed
        0 seconds in (major) GC
        0 mutations
        0 minor GCs
        0 major GCs

I don't think this is that bad...

>
> Generally speaking, how does chicken performance compare with other
> Scheme
> implementations? (I expect there is a document somewhere ...)
>

There were a few comparisons. I even had some graphical tables, but keeping
the information up-to-date was getting tedious. The code generated by
Chicken
is in most cases a good deal faster than all byte-code interpreter based
implementations, depending on optimizations. Compared with other Scheme->C
compilers, Chicken and PLTs mzc are roughly in the same ballpark. Chicken
appears to be faster than Gambit on floating-point heavy code.
Once you start using first-class continuations heavily or are doing
allocation-
intensive stuff, Chicken can also keep up with Bigloo and other
high-performance
implementations.
Note that libraries (like string-processing) are not so dependent on the
compiler.
For example, Chicken's string-processing code is not particularly fast,
since
it's written in Scheme and hasn't been tuned that much.

cheers,
felix

_______________________________________________
Chicken-users mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/chicken-users

[Chicken-users] benchmarks, address@hidden, 2003/11/24

Re: [Chicken-users] benchmarks, Felix Winkelmann, 2003/11/24
- Re: [Chicken-users] benchmarks, Bert Douglas <=

Re: [Chicken-users] benchmarks, address@hidden, 2003/11/24