[-*- mode: indented-text -*-] Notes and ideas for implementing lexical scoping in Elisp ========================================================= Changes required ---------------- + define 16 new byte compiler opcodes, Blocalref + 0-7 and Blocalset + 0-7, for manipulating local variables (stored in a separate alloca'd vector in Fbyte_code, or perhaps at the beginning of the stack vector - it must be GCPRO'd). - It looks like opcodes 000-007 and 060-067 (48-55) can be used for this (they are presently unused in 19.x and 20.x - worth checking what XEmacs and Emacs 18.x use though). -> Emacs 18.x used fewer opcodes than 19.x (all of which are either still in use, or reserved if obsolete). -> XEmacs 20.4 uses 0266-0267 (182-183) which are free in Emacs 20.3, and has reused some of the obsolete opcodes, but otherwise is the same. => full list of unused opcodes (in octal): Emacs 20.3 XEmacs 20.4 Proposed use ---------- ----------- ------------ 000-007 000-007 Blocalref0-7 060-067 060-067 Blocalset0-7 0200 0200 Bextend 0251 0251 Bfunction 0262 0262 Bmakeindirect 0263 0263 0264 0264 0265 0265 0266-0267 (used) Blocalref and Blocalset are the lexical counterparts to Bvarref and Bvarset. Blocalalloc and Blocalfree are used to allocate and free binding slots for lexical variables that can escape into heap closures. Bfunction is used to construct closures, which can be heap or stack allocated. Bextend is reserved to allow future opcode expansion, by serving as a prefix for extended (multibyte) opcodes. - How to reference lexical variables in nested bindings though? Is it acceptable to "flatten" all bindings in a function? If so, what happens if multiple closures are generated that refer to an inner binding that is destroyed and recreated in between (ie. the inner binding is different for each closure, but the outer bindings are shared)? -> If no closures can be created (or at least none that can escape the scope of the innermost binding they reference), it is okay to flatten all bindings for a function (as C does). Nested functions still require some sort of "non-local" binding access (eg. using a display vector or stack). => The only bindings which cannot be migrated to the function level are those captured by a closure that escapes that binding's lexical scope; ie. bindings which need to be evacuated, and should therefore be in separate block(s) to be safe for space complexity (ie. to minimize garbage retention). Whether or not a given binding is allocated in a separate block is independent of the mechanism for accessing non-local bindings; ie. a closure should be able to reference both the main function-level block of an enclosing function, as well as any additional blocks it may allocate. What is necessary is for the compiler to fix the "depth" of each binding, so there is no ambiguity about how to reference each binding. (Obviously the depth of a given binding can be different at different points in the code, or for different nested functions.) Possible solutions: - include a display index in (at least some) localref/set instructions; each level of binding (within a function, as well as in nested functions) in principle allocates a new lexical environment block, and pushes a pointer to it on the display stack; on exit, the display stack is popped. Indexing is from the top of the stack. Generating a closure can evacuate some or all of the binding blocks into the heap. -> would only need to explicitly modify display stack when entering and exiting blocks whose bindings can escape. Otherwise, nested blocks can be collapsed and referenced from a single display entry. In fact, inner bindings that are not captured by closures can even be migrated out through enclosing bindings that are captured into the function-level environment block. -> obviously we want as many of the localref/set instructions as possible to not include an explicit display index, since we expect to be able to place most bindings in the function-level block (which should then be kept as index 0 and not be "buried" by inner blocks dynamically created for captured bindings). It is probably sufficient to use the first 7 opcodes of each type to index the function-level block (giving access to the first 256 bindings), and extend the 8th (which already takes two extra bytes to form a 16-bit index) to include a display index as well, making it up to three extra bytes. If the local function-level block is always index 0, then it is not quite clear where to locate inherited binding blocks (ie. when we have nested functions). Presumably it is easy enough to assign each enclosing block (both function-level and closure-specific blocks) successive indices followed by inner closure-specific blocks. -> The problem is that we want to represent the environment in a closure using a single env_ptr; therefore we need to arrange that the right blocks are chained together in the order expected, while (ideally) still keeping the main function block at index 0. The solution is probably to insert dynamically created envblocks in the current env_ptr chain after the head block. In the case that a closure already has several inherited blocks in it's env chain after the main function block, there is a choice between pushing inherited blocks further down the chain (changing their index) or adding the new local blocks after the inherited ones (whose number is at least fixed). The latter choice would allow for simpler opcodes to add and remove blocks (since they would always insert or remove immediately after the head block), while hopefully not overly complicating the compiler's book-keeping which must track the current depth of each block. Proposed solution: - All bytecode objects have an environment slot, which must be an untagged pointer (struct Lisp_Envblock *) because most envblocks will be stack allocated, and hence the slot must also be hidden from direct lisp access (eg. aref), and handled specially by GC. - When a bytecode object is invoked, env_ptr is set to the value of the environment slot before pushing the new envblock for the parameters on the front. The parameter envblock includes space for the main function-level bindings, either before or after the parameter values, whichever works out better. [After would seem appropriate.] -> Fbyte_code should keep its existing interface if possible, for backwards compatibility (some existing bytecode contains direct invocations of `byte-code', in lieu of constructing a proper bytecode object). So an internal interpret_byte_code function is needed, which takes the pre-constructed parameter envblock. => Since extant bytecode doesn't have lexical bindings, and new bytecode will be passed parameters as well as the closure environment using env_ptr, there is no need to change Fbyte_code's arguments. - If the invoked bytecode needs additional envblocks for bindings that may need to be evacuated, then it preallocates the blocks on the stack but doesn't link them into the env chain yet. Instead, it keeps private references to them. - When a binding not allocated to the main block comes into scope, the relevant extra block is pushed onto the env chain. When the binding goes out of scope, the env chain is popped. -> If using the indirect value method (see next point), then instead of pushing new envblocks on the chain, the relevant slot pointer is (re)set to a private local slot provided for the purpose (the local slot is "allocated" by incrementing an index). The index is decremented when the binding goes out of scope. Either the number of required local slots is specified in the bytecode object (and alloca'd) or else included in the required environment size (with slots then being allocated from the end of the envblock; parameters and bindings taking a fixed amount from the beginning). If the latter approach is taken, slots should be zeroed when freed to avoid retaining garbage. - To create a closure, the Bfunction opcode is invoked, which interprets a variable length "script" following the opcode. The script specifies how many env blocks need to be evacuated. (This leaves an interesting problem of how to stop tracing the closure's env chain, when it reaches a block that is not evacuated. If we have to evacuate the full chain, then we've gained nothing from using extra blocks.) -> One alternative is to use a local display vector which references the additional local blocks. When a closure is created, the relevant blocks are linked together and evacuated if necessary. I'm not entirely sure, but it may be possible to allocate bindings to blocks in such a way that it is safe to chain blocks to previously evacuated blocks referenced by other closures. In this scheme, the bytecode must then setup the display vector on invocation to reference the appropriate blocks from the env chain. -> A second, more radical, alternative is to distinguish those bindings which can be evacuated, and access them indirectly through a pointer stored in the relevant envblock. Thus each function has just one local envblock (envblocks are chained in the normal way). Nothing needs to happen when entering or exiting the scope of bindings that can escape; instead creating a closure evacuates the relevant bindings (if not already evacuated), updating the owning envblock pointers. -> Since indirect values can either be stack or heap allocated, we can't really use a mechanism like setting markbit on those slots that are value pointers (and in any case that would slow down access for all variables). Instead, it seems reasonable to set aside one bit from Blocalref/set7 (eg. one of the 8 nesting depth bits) to indicate that an indirection is required. At the same time, it would probably be worth while to use a couple of bits from Blocalref/set6 to specify the envblock nesting, with the remaining bits specifying the envblock slot index. This provides reasonably efficient access to bindings in the first three nested blocks, and still leaves enough bits for the slot index that Blocalref/set7 ops will almost never be needed to access direct bindings. => In fact, it would seem worth it to set aside a bit of the Blocalref/set6 instruction to control indirection. If the closure can escape, then a new envblock is (heap) allocated for it, containing pointers to all the (presumably evacuated) bindings it needs, and the bare bytecode object being closed over is cloned. If the closure cannot escape, then the the bare bytecode object embedded in the enclosing function is (re)used directly as the closure value, with the current env_ptr stored in it's environment slot. (This is okay, because only one distinct closure of the bytecode object can exist at a time; thus the bytecode object embedded in the parent code can be serially reused.) For garbage collection, it is necessary to distinguish those envblock entries which are actually pointers to the real data; either they can follow any immediate values (with a separate size field to indicate how many slots are immediate), or perhaps the markbit can be set on those slots which are pointers. -> Since parameter bindings might need to be indirected, which the caller will not know when it constructs the main envblock, the compiler will need to allocate "proxy" slots for such bindings (emitting code to initialize them in the function prolog). Thus it is not difficult for the compiler to locate indirect slots after direct slots, if that is useful. Evacuated values could be stored in special blocks (similar to cons blocks), with GC performed by setting markbits on the live values (even immediate ones) as they are scanned. Free cells are linked into a block free list. + will need at least one new opcode (Bfunction) for creating closures in byte code; a reasonable approach would be for the compiler to emit a small "script" following Bfunction which indicates which environment slots are needed by the closure. It is not quite clear how to optimize or choose the environment representation to minimize the number of bindings that will be unnecessarily captured. It might be reasonable for the compiler to place bindings that can be captured into separate environment blocks. -> Yet more opcodes will be needed to dynamically push and pop envblocks on the local environment chain, to hold those bindings which can be captured by closures. => The extra opcodes are still needed when using the indirected slot representation, since we need to be able to setup an arbitrary number of indirection pointers, and if necessary reset them to local slots when reestablishing another instance of a captured binding that can escape. + Handle the possibility that (interactive ...) form might follow declarations at the beginning of a lambda body. + Would like to define an additional 8 "fastcall" opcodes, to speed up function invocation. Could perhaps use 184-191. Will want to reserve at least one opcode as an "extended opcode" prefix, to provide room for future expansion. => Note that rewriting bytecode ops would not be allowed with purified strings, ie. when functions are preloaded, because purified strings must be treated as read-only. + extend definition of byte-compiled function objects to include size of local variable vector (ie. the primary function-level binding block). Might as well include an environment entry in all bytecode objects, rather than having to use separate closure objects. - extend Lisp reader to accept extra value(s) in bytecode vector (implicitly inserting a value of 0/nil for old bytecode definitions, which obviously will not have any Blocalset/ref opcodes). + to support lexical scoping of function arguments, remove specbind'ing from funcall_lambda, at least when calling new-style bytecode functions. Instead, Fbyte_code should receive its arguments as an array (as Ffuncall does) - it should copy them into the local variable array, relying on the byte compiler to emit bytecode to bind any parameters which are declared special. - Or perhaps new-style bytecode should use nil in the argument list for parameters which are lexically scoped - that way funcall_lambda can still bind special parameters (which will be all parameters for old-style bytecode, since none of the parameter names will be nil). Fbyte_code still receives the full arg list as a vector. - To support lexical scoping for interpreted code, funcall_lambda would need to check which parameters are declared special in the case of calling lambda expressions. Special parameters would be specbound as now, while normal parameters would be added to a newly created environment dictionary (added to the head of the environment list). -> correction: funcall_lambda would create a new environment consisting initially of just the parameter bindings. I believe it is correct to say that Ffuncall always hides the existing environment stack. When calling a closure, Ffuncall will obviously reinstate the saved environment before binding parameters. => Partly right; see next main point for more details. However, it is not true that funcall_lambda should always hide the current env; an exception is when a lambda expression occurs in the function position of a form being evalled. In this case, the body of the lambda expression should have access to lexically apparent bindings. + lexical scoping for interpreted code is achieved by creating an environment list, which works rather like dictstack in Postscript (except we start a new stack for each top-level lexical environment - ie. each top-level form). - Special forms like Flet push a new dictionary on the current dictstack populated with local bindings (while specbinding those variables which are declared special); variable lookups (eg. Feval or Fset) are done by searching the current dictstack for a binding before using the global value. -> To cater for the possibility that an inner scope declares a variable special which has already been given a lexically visible binding, it may be necessary to add an entry to the environment which indicates that variable is special. Either that, or every variable access must explicitly check whether the variable is special. => It is necessary to have an entry for bindings that are (locally) declared special (using "declare", as in CL) - the value for the binding would contain a distinguished value Qspecbound (analogous to Qunbound) which instructs the interpreter to reference the global value. -> Note that Flet must check for declare forms at the beginning of the body (first expanding macros as necessary!) before actually binding anything to determine whether any of the variables are declared special. => Other builtins that must also do this are: funcall_lambda (and anything else which does parameter bindings). -> I believe it will be necessary to specially tag defvar'd symbols somehow, since there doesn't currently seem to be any foolproof way to tell which symbols are special. (Byte compiler seems to do its check by remembering which variables have been defined in the current file, or require'd files.) => A bit in the symbol record can hopefully be used for this (and specbind/unbind extended to allow "binding" the specialness of symbols). => A symbol bit is needed, but note that CL prohibits a symbol "proclaimed" (ie. globally declared) special (eg. by defvar) from _ever_ being lexically bound, so there is no need to specbind a symbol's (global) specialness. Instead, the interpreter first checks the specialness of symbols before searching the environment for lexical bindings. (Local specialness, on the other hand, is overriden by inner lexical bindings; every new binding must be explicitly declared special if those semantics are required. This is handled by a Qspecbound entry in the environment.) -> Dictionaries created by Flet et al will normally be alloca'd, but with the possibility of being evicted/evacuated into the heap when necessary (ie. when Ffunction generates closures, or at least when a closure is about to "escape" its creating frame). - Multiple dictstacks can be achieved (I think) by specbinding a global envp which points to the current environment (ie. the top of the current dictstack). Feval, Ffuncall, Fapply etc will then specbind envp, eg. to nil or to the saved env if invoking a closure. -> Note that the current Feval will be renamed to internal_eval (or something) - the various special forms which currently call Feval will therefore usually call internal_eval instead, as will the new Feval (after first specbinding envp to nil). => On second thought, I think Feval can stay as is, with other special forms calling it directly. Lisp code that calls it explicitly will get an empty environment automatically, because Feval will be invoked via Ffuncall, which always hides the current environment. => Rather than coerce specbind into binding envp, it is more appropriate to store the current value in the backtrace struct for a new function (since that is when a new stack is started). Suitable access to hidden environments can be arranged for the benefit of the Lisp-level debugger (at the moment, there is no access for buried specbindings anyway - that would be good to have available to the debugger). + to implement lexical closures efficiently we want to avoid requiring Ffunction to always evacuate alloca'd environment blocks into the heap if possible, since we expect most uses of closures are immediate - ie. the closure is immediately passed to funcall or apply in the same block where it is created - effectively restricting the lexical bindings to dynamic extent. (Of course, many closures will not even reference their environment at all.) -> Note: in the absence of proper closure analysis, it will be necessary to evacuate the entire current dictstack, since it is not possible to tell which bindings might actually be referenced in closed over lamda expressions (which could be arbitrarily nested, just to make things interesting). Of course, Ffunction doesn't evacuate if its arg is a symbol. (CL specifies that Ffunction returns the function binding if passed a symbol, but elisp just returns the symbol, to delay autoloading I think.) Possible implementation: - Flet will always add new alloca'd environment block to env chain - Ffunction will use pointer to alloca'd block in closure -> Fset (and others, eg. Feval, Ffset) will have to detect when a closure reference is being stored in a variable/function binding (or array slot etc), and cause the environment block (chain) to then be evacuated if still pointing to stack memory. Also, Flet will need to check whether a closure is being returned as its value. -> Checks would also be required by functions that allocate, eg. Fcons, since there are lots of ways the closure could "escape" the enclosing Flet. => Because of the difficulty and run-time expense of detecting when closure's escape, it is probably best for Ffunction to always evacuate the environment - after all, this only happens in interpreted code (we expect the compiler to be more intelligent about identifying which variables need to be closed over, and whether the resulting closure can escape). -> What happens when multiple closures are created over the same env? In interpreted code, presumably the first call to Ffunction evacuates the env chain, and subsequent calls just use the reference to the now heap-based env. -> What happens with nested calls to Ffunction? Perhaps this just happens to work correctly without doing anything special: the outer call to Ffunction captures the entire env chain (when interpreted, at least); when later funcalled, that env will again be "active" (possible with additional bindings on the front) when an inner call to Ffunction happens. It is only the compiler that has to perform closure analysis to "shake out" the environment if possible. + When calling bytecode, we would ideally like to merge the argument bindings with the function's basic lexical bindings (ie. most bindings in the function, since we expect to flatten most of them to the outer binding scope). This maximizes the use of the quick access versions of Blocalref/set opcodes (see discussion about opcodes). To do this either requires copying the arg vector into space allocated in the main function binding block, or arranging for the arg vector itself to have the extra space needed for the function's basic local bindings. -> A hybrid seems reasonable for an initial implementation: the arg vector is copied into a fresh envblock, but the envblock includes space for the requested number of slots. General Observations -------------------- + when modifying bytecomp.el, remember that cl*.el contains code to help compile CL special forms. + Some .el files use plain lambda expressions as functional values, eg. for functions like mapcar. The current bytecode compiler appears to treat them as if they had been wrapped in (function ...); at least, it compiles the lambda expression into a bytecode object. This raises some interesting questions when closures are introduced: should plain lambda expressions in arg lists "evaluate" to closures? If so (which would help for backward compatibility with existing code), shouldn't the interpreter do the same? Currently, I think such forms will (now) be evaluated in a NULL lexical environment. => It turns out that `lambda' names a macro that expands into a call to `function', so all is well after all. - The compiler also treats '(lambda ...) like #'(lambda ...). That is problematic (and possibly going a bit too far) - unfortunately, there is existing code that relies on it, which will break under strict interpretation of lexical scoping rules. -> Should we declare that such code is broken? The compiler could easily warn when it encounters '(lambda ...), but the interpreter can't really do that, so maybe we do have to make this a special case as well. -> Packages known to affected (in 19.34): allout, ange-ftp, byte-opt, most of cal-*, calendar, complete, disass, gulp, help, hideif, icomplete, lisp-mnt, -> Packages known to use '(lambda ...), though apparently not affected: avoid, byte-run (commented out), bytecomp, completion, diary-lib, ediff*, emerge, facemenu, faces, files, forms, gnus-gl, gnus-kill, gnus-msg, gnus-score, gnus-srvr, gnus-uu, gnus, holidays, imenu, info, lmenu (?), lunar, make-mode, + Packages known to rely on dynamic binding, that will need suitable declarations: find-file, forms (possibly), gnus-gl (possibly), rmail (rmail-set-message-counters), + gomoku.el has two lambda expressions with a duplicated parameter name! So does mailalias, + What to do about buffer-local variables? In particular, variables that are not permanently buffer-local, but declared so dynamically using make-local-variable or make-variable-buffer-local. -> Presumably such declarations must be treated as proclamations, but it will probably be necessary for actual `proclaim' statements to be added so the compiler can know which variables have special bindings. -> Packages known to reference non-permanent buffer-local vars: -> If a "declaim" (?) function is provided, to allow un-proclaiming the specialness of symbols, it must refuse to do so for permanent (builtin) buffer-local variables, nil, t etc. => I believe most if not all packages that make buffer-local variables already defvar the variables; this is certainly the correct thing for them to do. So it is not strictly necessary for make-local-variable or make-variable-buffer-local to implicitly proclaim the symbol special, but to avoid needless code breakage, it is reasonable for the interpreter to do that. The compiler will issue a warning if the symbol wasn't already defvar'd. -> A related problem is the handling of platform-specific DEFVAR_* declarations (eg. the Vw32_* variables) - these need to be declared or proclaimed on all platforms in order that byte-code compiled on another platform (eg. by the FSF) uses dynamic binding. (Hopefully a compiler warning about unused local variables will be issued in most cases when this hasn't been done.) => As well as issuing a warning about unreferenced local variables that aren't known to be special, it should assume the variable is special and specbind it anyway. The warning would suggest it should be proclaimed. To handle platform specific DEFVARs, makedoc should generate a list of all known DEFVARs for the byte compiler. => A better solution might be to require that all files be (re)compiled locally on installation. It should be possible to compile everything from scratch anyway (which probably will necessitate adding proper require's to all sorts of packages). + Flet and funcall_lambda must not only check for special declarations when creating bindings, but must add Qspecbound entries for symbols that are not being bound, but are declared locally special. => Done. [03-Dec-98] No it isn't (maybe this code was lost). + Noticed some minor speed improvements that could be made to the interpreter: - various functions call Fcar then later Fcdr on the same value -> Fcar signals an error if the value is not a CONS cell, so Fcdr could be inlined to XCONS(val)->cdr to avoid function call. => Not quite right; Fcdr also accepts nil without complaint, which is relevant in a few places (eg. evalling args for subrs). In any case, any efficiency gain is likely to be miniscule. - functions like Fcar don't directly signal an error; they call wrong_type_argument which has some mocklisp compatibility stuff that can coerce numbers to strings and vv, or signals an error. [nothing wrong with that though - no efficiency lost in Fcar et al.] - not sure why Flet et al include "QUIT;" in the loop that binds variables - surely this is unnecessary (since Feval does a QUIT anyway, and if not evalling, the time spent is minimal, though theoretically unbounded)? + Possible (minor) speed improvements to Fbyte_code: - don't zero stack slots at start; instead, update the gcpro length field based on stackp before doing Ffuncall or anything else which can cause GC - unbind_to, Fcondition_case, internal_catch, Findent_to (calls Finsert_char which runs before/after change fns), Fread_char (may execute macros, timer code, etc), Fdelete_region, and maybe others I've missed. -> This also helps make bytecode "safe for space complexity", ie. stops old references in unused stack slots from retaining garbage. For the same reason, it may be appropriate for the compiler to generate code to nil out local bindings that are no longer needed, if the environment block might be long-lived. - only check for relocation of bytestr after executing opcodes that we know could have caused GC. - can we skip the SYMBOLP check on Bvarref? Surely the compiler ensures that vectorp indices refer to symbols? (If really paranoid, presumably we could implement a bytecode checker that is run when bytecode object is created.) -> We could skip it, but of course it won't matter much when most variables are lexically scoped. + A possible future optimization is for callers of Ffuncall to make the argument vector large enough to hold maxargs if subrp, or numargs+numlocals if compiledp. (In the latter case, the argument vector would need to in fact be a properly formatted Lisp_Envblock. It could well work out better if Lisp_Envblock contained a pointer to a plain C vector, so bytecode wouldn't need to leave space on the value stack for the envblock header.) The benefit from doing this would be to avoid making a copy of the arg vector. -> The problem is that the actual function called at runtime might not match the number of arguments etc that the compiler thought it would. + not strictly related to lexical scoping, but when trying to switch Lisp_Object representation to have bottom 2 bits (say) as tag bits (mark bits stored to the side), will need to detect places in C code where Lisp_Object's are treated as ints, and `make_number' and XINT/XFASTINT aren't used for handling numbers. => Undef NO_UNION_TYPE in lisp.h, so that Lisp_Object is a struct (actually union). This should go a long way to tracking down sloppiness in handling lisp values in C. => Make XFASTINT (and perhaps XINT too) expand to (0, value) to prevent them being used on the LHS. + consider supporting the dynamic-extent declaration from CL, which says that a variable is to be stack-allocated, even if referenced by closures (I guess - check the CLHS). Functions to test interpreter and bytecode interpreter/compiler =============================================================== Simple tests of interpreter semantics ------------------------------------- ;;; Eval in interaction buffer ;;; Define some simple closure generators (defun make-adder-1 (addend) (lambda (val) (+ val addend))) (defun make-adder-2 (addend) (let ((a addend)) (lambda (val) (+ val a)))) (setq adder-1 (make-adder-1 3)) (setq adder-2 (make-adder-2 4)) (setq adder-3 (let ((a 5)) (lambda (val) (+ val a)))) ;;; Test that the closures work (funcall adder-1 10) (funcall adder-2 10) (funcall adder-3 10) (setq adder-1 nil) (setq adder-2 nil) (setq adder-3 nil) ;;; Test that gc of interpreted closures works (garbage-collect) ;;; Create a matched pair of getter and setter methods (defun make-safe-place (initval) (let ((place initval)) (cons (lambda () place) (lambda (newval) (setq place newval))))) (setq place-1 (make-safe-place 1)) (setq place-2 (make-safe-place 2)) (funcall (car place-1)) (funcall (car place-2)) (funcall (cdr place-1) 15) (funcall (car place-1)) (funcall (car place-2)) (funcall (cdr place-2) 29) (funcall (car place-1)) (funcall (car place-2)) Manually compiled bytecode objects ---------------------------------- Example A --------- (defun simple-fn (a) (let ((b 2)) (+ a b))) ;;; The actual bytecode object from the traditional compiler: #[ ;; arglist (a) ;; bytecode "\302 \\)\207" ;; \302 \030 \011 \010 \134 \051 \207 ;; constants [b a 2] ;; stack size 2 ] ;;; Disassembly of the bytecode: byte code for simple-fn: args: (a) 0 constant 2 ;; \302 1 varbind b ;; \030 2 varref a ;; \011 3 varref b ;; \010 4 plus ;; \134 5 unbind 1 ;; \051 6 return ;; \207 ;;; Disassembly of the new bytecode with lexical scoping: byte code for simple-fn: args: (a) 0 constant 2 ;; \300 1 localset b ;; \061 2 localref a ;; \000 3 localref b ;; \001 4 plus ;; \134 5 return ;; \207 (defalias 'simple-fn-lexical #[ ;; arg info (min max . numlocals) (1 1 . 1) ;; bytecode "\300\061\000\001\134\207" ;; constants [2] ;; stack size 2 ]) Example B --------- (defun add-log-iso8601-time-zone (time) (let* ((utc-offset (or (car (current-time-zone time)) 0)) (sign (if (< utc-offset 0) ?- ?+)) (sec (abs utc-offset)) (ss (% sec 60)) (min (/ sec 60)) (mm (% min 60)) (hh (/ min 60))) (format (cond ((not (zerop ss)) "%c%02d:%02d:%02d") ((not (zerop mm)) "%c%02d:%02d") (t "%c%02d")) sign hh mm ss))) ;;; The actual bytecode object from the traditional compiler: #[ ;; arglist (time) ;; bytecode "address@hidden !\211\313\246 \313\245\211\313\246 \313\245\314 \307U\2046\000\315\202B\000\307U\204A\000\316\202B\000\317\012 %.\207" ;; constants [time utc-offset sign sec ss min current-time-zone 0 45 43 abs 60 format "%c%02d:%02d:%02d" "%c%02d:%02d" "%c%02d" mm hh] ;; required stack size 7 ;; docstring (not present) ;; interactive spec (not present) ] ;;; Disassembly of the bytecode: byte code for add-log-iso8601-time-zone: args: (time) 0 constant current-time-zone ;; \306 1 varref time ;; \010 2 call 1 ;; \041 3 car ;; \100 4 goto-if-not-nil-else-pop 1 ;; \206 \010 \000 7 constant 0 ;; \307 8:1 dup ;; \211 9 varbind utc-offset ;; \031 10 constant 0 ;; \307 11 lss ;; \127 12 goto-if-nil 2 ;; \203 \023 \000 15 constant 45 ;; \310 16 goto 3 ;; \202 \024 \000 19:2 constant 43 ;; \311 20:3 varbind sign ;; \032 21 constant abs ;; \312 22 varref utc-offset ;; \011 23 call 1 ;; \041 24 dup ;; \211 25 varbind sec ;; \033 26 constant 60 ;; \313 27 rem ;; \246 28 varbind ss ;; \034 29 varref sec ;; \013 30 constant 60 ;; \313 31 quo ;; \245 32 dup ;; \211 33 varbind min ;; \035 34 constant 60 ;; \313 35 rem ;; \246 36 varbind mm ;; \036 \020 38 varref min ;; \015 39 constant 60 ;; \313 40 quo ;; \245 41 varbind hh ;; \036 \012 43 constant format ;; \314 44 varref ss ;; \014 45 constant 0 ;; \307 46 eqlsign ;; \125 47 goto-if-not-nil 4 ;; \204 \066 \000 50 constant "%c%02d:%02d:%02d" ;; \315 51 goto 6 ;; \202 \102 \000 54:4 varref mm ;; \016 \020 56 constant 0 ;; \307 57 eqlsign ;; \125 58 goto-if-not-nil 5 ;; \204 \101 \000 61 constant "%c%02d:%02d" ;; \316 62 goto 6 ;; \202 \102 \000 65:5 constant "%c%02d" ;; \317 66:6 varref sign ;; \012 67 varref hh ;; \016 \021 69 varref mm ;; \016 \020 71 varref ss ;; \014 72 call 5 ;; \045 73 unbind 7 ;; \056 \007 75 return ;; \207 ;;; Disassembly of the new bytecode with lexical scoping: Locals: 0 time ;; arg 0 1 utc-offset 2 sign 3 sec 4 ss 5 min 6 mm 7 hh Constants: 0 current-time-zone 1 0 2 ?- ;; 45 3 ?+ ;; 43 4 abs 5 60 6 format 7 "%c%02d:%02d:%02d" 8 "%c%02d:%02d" 9 "%c%02d" byte code for add-log-iso8601-time-zone: args: (time) 0 constant current-time-zone ;; \300 1 localref time ;; \000 2 call 1 ;; \041 3 car ;; \100 4 goto-if-not-nil-else-pop 1 ;; \206 \010 \000 7 constant 0 ;; \301 8:1 dup ;; \211 9 localset utc-offset ;; \061 10 constant 0 ;; \301 11 lss ;; \127 12 goto-if-nil 2 ;; \203 \023 \000 15 constant 45 ;; \302 16 goto 3 ;; \202 \024 \000 19:2 constant 43 ;; \303 20:3 localset sign ;; \062 21 constant abs ;; \304 22 localref utc-offset ;; \001 23 call 1 ;; \041 24 dup ;; \211 25 localset sec ;; \063 26 constant 60 ;; \305 27 rem ;; \246 28 localset ss ;; \064 29 localref sec ;; \003 30 constant 60 ;; \305 31 quo ;; \245 32 dup ;; \211 33 localset min ;; \065 34 constant 60 ;; \305 35 rem ;; \246 36 localset mm ;; \066 \006 38 localref min ;; \005 39 constant 60 ;; \305 40 quo ;; \245 41 localset hh ;; \066 \007 43 constant format ;; \306 44 localref ss ;; \004 45 constant 0 ;; \301 46 eqlsign ;; \125 47 goto-if-not-nil 4 ;; \204 \066 \000 50 constant "%c%02d:%02d:%02d" ;; \307 51 goto 6 ;; \202 \102 \000 54:4 localref mm ;; \006 \006 56 constant 0 ;; \301 57 eqlsign ;; \125 58 goto-if-not-nil 5 ;; \204 \101 \000 61 constant "%c%02d:%02d" ;; \310 62 goto 6 ;; \202 \102 \000 65:5 constant "%c%02d" ;; \311 66:6 localref sign ;; \002 67 localref hh ;; \006 \007 69 localref mm ;; \006 \006 71 localref ss ;; \004 72 call 5 ;; \045 73 return ;; \207 (defalias 'add-log-iso8601-time-zone-lexical #[ ;; arglist (min max . numlocals) (1 1 . 7) ;; bytecode "\300\000\041\100\206\010\000\301\211\061\301\127\203\023\000\302\202\024\000\303\062\304\001\041\211\063\305\246\064\003\305\245\211\065\305\246\066\006\005\305\245\066\007\306\004\301\125\204\066\000\307\202\102\000\006\006\301\125\204\101\000\310\202\102\000\311\002\006\007\006\006\004\045\207" ;; constants [current-time-zone 0 45 43 abs 60 format "%c%02d:%02d:%02d" "%c%02d:%02d" "%c%02d"] ;; required stack size 7 ;; docstring (not present) ;; interactive spec (not present) ]) Example C --------- (defun map-add (list addend) (mapcar (lambda (val) (+ val addend)) list)) #[ (list addend) "\301\302\"\207" [ list mapcar #[ (val) "\010\011\134\134\207" [val addend] 2 ] ] 3 ] byte code for map-add: args: (list addend) 0 constant mapcar 1 constant args: (val) 0 varref val 1 varref addend 2 plus 3 return 2 varref list 3 call 2 4 return ;;; Disassembly of the new bytecode with lexical scoping: byte code for map-add: args: (list addend) locals: constants: [mapcar] 0 constant mapcar ;; \300 1 constant ;; \301 args: (val) locals: constants: 0 localref val ;; \000 1 envref addend ;; \006 \101 3 plus ;; \134 4 return ;; \207 2 make-stack-closure ;; \251 \000 4 localref list ;; \000 5 call 2 ;; \042 6 return ;; \207 ;; New bytecode (defalias 'map-app-lexical #[ (2 2 . 0) "\300\301\251\000\000\042\207" [ mapcar #[ (1 1 . 0) "\000\006\101\134\207" [] 2 ] ] 3 ]) Example D --------- (defun make-adder-1 (addend) (lambda (val) (+ val addend))) #[ (addend) "\300\207" [ #[(val) "\010\011\134\207" [val addend] 2 ] ] 1 ] byte code for make-adder-1: args: (addend) 0 constant ;; \300 args: (val) 0 varref val ;; \010 1 varref addend ;; \011 2 plus ;; \134 3 return ;; \207 1 return ;; \207 ;;; Disassembly of the new bytecode with lexical scoping: byte code for make-adder-1: args: (addend) 0 makeindirect addend ;; \262 \000 \000 3 constant ;; \300 args: (val) 0 localref val ;; \000 1 ienvref addend ;; \007 \003 \000 \000 5 plus ;; \134 6 return ;; \207 4 make-full-closure 1 ;; \251 \002 \001 \000 addend ;; \000 \000 \000 10 return ;; \207 (defalias 'make-adder-1-lexical #[ (1 1 . 0) "\262\000\000\300\251\002\001\000\000\000\000\207" [ #[ (1 1 . 0) "\000\007\003\000\000\134\207" [] 2] ] 1]) Example E --------- (defun make-safe-place (initval) (let ((place initval)) (cons (lambda () place) (lambda (newval) (setq place newval))))) ;;; Traditional bytecode for make-safe-place #[ (initval) "\300\301B\207" [ #[nil "\207" [place] 1] #[(newval) "\211\207" [newval place] 2] ] 2] ;;; Disassembly for make-safe-place byte code for make-safe-place: args: (initval) 0 constant args: nil 0 varref place 1 return 1 constant args: (newval) 0 varref newval 1 dup 2 varset place 3 return 2 cons 3 return ;;; Disassembly for make-safe-place-lexical byte code for make-safe-place-lexical: args: (initval) 0 localref initval ;; \000 1 localset place ;; \061 2 makeindirect place ;; \262 \001 \000 5 constant ;; \300 args: nil 0 ienvref place ;; \007 \003 \000 \000 4 return ;; \207 6 make-full-closure 1 ;; \251 \002 \001 \000 10 place ;; \000 \001 \000 13 constant ;; \301 args: (newval) 0 localref newval ;; \000 1 dup ;; \211 2 ienvset place ;; \067 \003 \000 \000 6 return ;; \207 14 make-full-closure 1 ;; \251 \002 \001 \000 18 place ;; \000 \001 \000 21 cons ;; \102 22 return ;; \207 (defalias 'make-safe-place-lexical #[ (1 1 . 1) "\000\061\262\001\000\300\251\002\001\000\000\001\000\301\251\002\001\000\000\001\000\102\207" [ #[(0 0 . 0) "\007\003\000\000\207" [] 1] #[(1 1 . 0) "\000\211\067\003\000\000\207" [] 2] ] 2]) Example F ---------