[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Axiom-developer] Re: lisp speedups
From: |
Waldek Hebisch |
Subject: |
[Axiom-developer] Re: lisp speedups |
Date: |
Tue, 20 Feb 2007 04:51:21 +0100 (CET) |
> Another technique that significantly improves the speed of lisp
> code is the use of declarations. In general, a function call in
> lisp has to have a case statement of the possible types of each
> argument and needs to handle the possible types. However, if you
> tell the compiler what the argument types are and the return types
> are you can get significantly faster code. The best way to illustrate
> this is to use the DISASSEMBLE function which will show you the code
> that gets laid down by the lisp compiler. CMUCL and SBCL are very
> good at optimizing code. GCL also does an excellent job. That's what
> the .fn files are for in Axiom.
>
The program I posted uses declarations. AFAICS sbcl has enough
information about types, from the declarations I gave it can infer
the rest (in my experiance gcl needs much more declarations).
When I wrote that the code could be faster I examined also output
of dissasemble. The following snippet:
normal-start
(incf pos)
(incf line-number)
(if (>= pos end-buff)
(return-from scan-for-chunks))
(setf code (aref buff pos))
(if (eql code start-tag-code-1)
(go chunk-start-tag-1))
(if (eql code newline-code)
(go normal-start))
(go normal)
corresponds to the following assembly code:
; 2AF4: L1: 488B4DD0 MOV RCX, [RBP-48]
; 2AF8: 4883C108 ADD RCX, 8
; 2AFC: 48894DD0 MOV [RBP-48], RCX
; 2B00: 488B55D0 MOV RDX, [RBP-48]
; 2B04: 488B45C8 MOV RAX, [RBP-56]
; 2B08: 488B48F9 MOV RCX, [RAX-7]
; 2B0C: 488B45C8 MOV RAX, [RBP-56]
; 2B10: 4839D1 CMP RCX, RDX
; 2B13: 0F86ED0B0000 JBE L48
; 2B19: 48C1FA03 SAR RDX, 3
; 2B1D: 488B45C8 MOV RAX, [RBP-56]
; 2B21: 480FB64C1001 MOVZX RCX, BYTE PTR [RAX+RDX+1]
; 2B27: 48C1E103 SHL RCX, 3
; 2B2B: 4883F950 CMP RCX, 80
; 2B2F: 0F840F090000 JEQ L34
; 2B35: EBBD JMP L1
Why this code is far from optimal? First, sbcl failed to allocate
variables to registers. FYI I use AMD64 machine and sbcl in principle
can use 14 general purpose registers (there is 16 registers, but two,
stack pointer RSP and base pointer RBP are used bu sbcl). The
'scan-for-chunks' function has 11 variables, so it does not look very
hard to allocate _all_ variables into registers. But apparently sbcl
not only failed to keep variables in registers, it also reloads value
which is still available in register. Second, there are useless shifts
-- variable code is always used in comparison with (constant) integers
and pos is mostly used in integer operations (it is also stored in
lists).
As a compiler writer I can understand why sbcl is unable to produce
better code. But I would not call it excellent.
--
Waldek Hebisch
address@hidden