[Guile-commits] GNU Guile branch, master, updated. v2.1.0-605-g69dc826

guile-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Guile-commits] GNU Guile branch, master, updated. v2.1.0-605-g69dc826

From:	Andy Wingo
Subject:	[Guile-commits] GNU Guile branch, master, updated. v2.1.0-605-g69dc826
Date:	Thu, 16 Jan 2014 14:54:37 +0000
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=69dc8268c278e0693b8240bf09a92e575dd1017b

The branch, master has been updated
       via  69dc8268c278e0693b8240bf09a92e575dd1017b (commit)
       via  507c58b28e94a49e247086dceca7d88109ca8de1 (commit)
      from  fb484fefebdd9ebcf43169b391f704069b3d4b09 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 69dc8268c278e0693b8240bf09a92e575dd1017b
Author: Andy Wingo <address@hidden>
Date:   Thu Jan 16 12:05:47 2014 +0100

    Finish documenting the new compiler
    
    * doc/ref/compiler.texi (An Introduction to CPS): Reword.
      (Compiling CPS): New sub-sub-section.
      (Bytecode): New sub-section.

commit 507c58b28e94a49e247086dceca7d88109ca8de1
Author: Andy Wingo <address@hidden>
Date:   Sun Jan 12 23:02:34 2014 +0100

    Fix CPS doc typos
    
    * doc/ref/compiler.texi (CPS in Guile): Fix a couple typos.

-----------------------------------------------------------------------

Summary of changes:
 doc/ref/compiler.texi |  273 ++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 212 insertions(+), 61 deletions(-)

diff --git a/doc/ref/compiler.texi b/doc/ref/compiler.texi
index 4e24e8b..845d5a8 100644
--- a/doc/ref/compiler.texi
+++ b/doc/ref/compiler.texi
@@ -534,12 +534,13 @@ compiler.
 * An Introduction to CPS::
 * CPS in Guile::
 * Building CPS::
+* Compiling CPS::
 @end menu
 
 @node An Introduction to CPS
 @subsubsection An Introduction to CPS
 
-As an example, consider the following Scheme expression:
+Consider the following Scheme expression:
 
 @lisp
 (begin
@@ -548,9 +549,8 @@ As an example, consider the following Scheme expression:
   (newline))
 @end lisp
 
-Let us identify all of the sub-expressions in this expression.  We give
-them unique labels, like @var{k1}, and annotate the original source
-code:
+Let us identify all of the sub-expressions in this expression,
+annotating them with unique labels:
 
 @lisp
 (begin
@@ -565,17 +565,18 @@ code:
   k6
 @end lisp
 
-These labels also identify continuations.  For example, the continuation
-of @code{k7} is @code{k6}.  This is because after evaluating the value
-of @code{newline}, performed by the expression labelled @code{k7}, we
+Each of these labels identifies a point in a program.  One label may be
+the continuation of another label.  For example, the continuation of
address@hidden is @code{k6}.  This is because after evaluating the value of
address@hidden, performed by the expression labelled @code{k7}, we
 continue to apply it in @code{k6}.
 
-Which label has @code{k0} as its continuation?  It is either @code{k1}
-or @code{k2}.  Scheme does not have a fixed order of evaluation of
-arguments, although it does guarantee that they are evaluated in some
-order.  However, continuation-passing style makes evaluation order
-explicit.  In Guile, this choice is made by the higher-level language
-compilers.
+Which expression has @code{k0} as its continuation?  It is either the
+expression labelled @code{k1} or the expression labelled @code{k2}.
+Scheme does not have a fixed order of evaluation of arguments, though it
+does guarantee that they are evaluated in some order.  Unlike general
+Scheme, continuation-passing style makes evaluation order explicit.  In
+Guile, this choice is made by the higher-level language compilers.
 
 Let us assume a left-to-right evaluation order.  In that case the
 continuation of @code{k1} is @code{k2}, and the continuation of
@@ -584,7 +585,7 @@ continuation of @code{k1} is @code{k2}, and the 
continuation of
 With this example established, we are ready to give an example of CPS in
 Scheme:
 
address@hidden
address@hidden
 (lambda (ktail)
   (let ((k1 (lambda ()
               (let ((k2 (lambda (proc)
@@ -603,45 +604,22 @@ Scheme:
                           (proc ktail))))
                 (k6 newline)))))
     (k1))
address@hidden lisp
address@hidden smalllisp
 
 Holy code explosion, Batman!  What's with all the lambdas?  Indeed, CPS
 is by nature much more verbose than ``direct-style'' intermediate
-languages like Tree-IL.  At the same time, CPS is more simple than full
-Scheme, in the same way that a Turing machine is more simple than
-Scheme, although they are semantically equivalent.
+languages like Tree-IL.  At the same time, CPS is simpler than full
+Scheme, because it makes things more explicit.
 
 In the original program, the expression labelled @code{k0} is in effect
-context.  Any values it returns are ignored.  This is reflected in CPS
-by noting that its continuation, @code{k4}, takes any number of values
-and ignores them.  Compare this to @code{k2}, which takes a single
-value; in this way we can say that @code{k1} is in a ``value'' context.
-Likewise @code{k6} is in tail context with respect to the expression as
-a whole, because its continuation is the tail continuation,
address@hidden  CPS makes these details manifest, and gives them names.
-
address@hidden Compiling CPS
-
-In CPS, there are no nested expressions.  Indeed, CPS even removes the
-concept of a stack.  All applications in CPS are in tail context.  For
-that reason, applications in CPS are jumps, not calls.  The @code{(k1)}
-above is nothing more than a @code{goto}.  @code{(k3 42)} is a
address@hidden with a value.  In this way, CPS bridges the gap between the
-lambda calculus and machine instruction sequences.
-
-On the side of machine instructions, Guile does still have a stack, and
-the @code{lambda} forms shown above do not actually result in one
-closure being allocated per subexpression at run-time.  Lambda
-expressions introduced by a CPS transformation can always be allocated
-as labels or basic blocks within a function.  In fact, we make a
-syntactic distinction between closures and continuations in the CPS
-language, and attempt to transform closures to continuations (basic
-blocks) where possible, via the @dfn{contification} optimization pass.
-
-Values bound by continuations are allocated to stack slots in a
-function's frame.  The compiler from CPS only allocates slots to values
-that are actually live; it's possible to have a value in scope but not
-allocated to a slot.
+context.  Any values it returns are ignored.  In Scheme, this fact is
+implicit.  In CPS, we can see it explicitly by noting that its
+continuation, @code{k4}, takes any number of values and ignores them.
+Compare this to @code{k2}, which takes a single value; in this way we
+can say that @code{k1} is in a ``value'' context.  Likewise @code{k6} is
+in tail context with respect to the expression as a whole, because its
+continuation is the tail continuation, @code{ktail}.  CPS makes these
+details manifest, and gives them names.
 
 @node CPS in Guile
 @subsubsection CPS in Guile
@@ -678,7 +656,7 @@ expressions.
 Declare the mutually recursive set of functions denoted by @var{names},
 @var{syms}, and @var{funs} within the sub-term @var{body}.  @var{names}
 and @var{syms} are lists of symbols, and @var{funs} is a list of
address@hidden values.  @var{syms} are globally unique.
address@hidden values.  @var{syms} are globally unique.
 @end deftp
 
 Here is an inventory of the kinds of expressions in Guile's CPS
@@ -785,7 +763,7 @@ calling function, if any.
 
 @deftp {CPS Continuation} $kreceive arity k
 Receive values on the stack.  Parse them according to @var{arity}, and
-then proceed with the parsed values to the @var{$kargs} continuation
+then proceed with the parsed values to the @code{$kargs} continuation
 labelled @var{k}.  As a limitation specific to @code{$kreceive},
 @var{arity} may only contain required and rest arguments.
 @end deftp
@@ -894,30 +872,203 @@ the syntax of @code{build-cps-term}, 
@code{build-cps-exp}, or
 @code{build-cps-cont}, respectively.
 @end deffn
 
address@hidden Compiling CPS
address@hidden Compiling CPS
+
+Compiling CPS in Guile has three phases: conversion, optimization, and
+code generation.
+
+CPS conversion is the process of taking a higher-level language and
+compiling it to CPS.  Source languages can do this directly, or they can
+convert to Tree-IL (which is probably easier) and let Tree-IL convert to
+CPS later.  Going through Tree-IL has the advantage of running Tree-IL
+optimization passes, like partial evaluation.  Also, the compiler from
+Tree-IL to CPS handles assignment conversion, in which assigned local
+variables (in Tree-IL, locals that are @code{<lexical-set>}) are
+converted to being boxed values on the heap.  @xref{Variables and the
+VM}.
+
+After CPS conversion, Guile runs some optimization passes.  The major
+optimization performed on CPS is contification, in which functions that
+are always called with the same continuation are incorporated directly
+into a function's body.  This opens up space for more optimizations, and
+turns procedure calls into @code{goto}.  It can also make loops out of
+recursive function nests.
+
+At the time of this writing (2014), most high-level optimization in
+Guile is done on Tree-IL.  We would like to rewrite many of these passes
+to operate on CPS instead, as it is easier to reason about CPS.
+
+The rest of the optimization passes are really cleanups and
+canonicalizations.  CPS spans the gap between high-level languages and
+low-level bytecodes, which allows much of the compilation process to be
+expressed as source-to-source transformations.  Such is the case for
+closure conversion, in which references to variables that are free in a
+function are converted to closure references, and in which functions are
+converted to closures.  There are a few more passes to ensure that the
+only primcalls left in the term are those that have a corresponding
+instruction in the virtual machine, and that their continuations expect
+the right number of values.
+
+Finally, the backend of the CPS compiler emits bytecode for each
+function, one by one.  To do so, it determines the set of live variables
+at all points in the function.  Using this liveness information, it
+allocates stack slots to each variable, such that a variable can live in
+one slot for the duration of its lifetime, without shuffling.  (Of
+course, variables with disjoint lifetimes can share a slot.)  Finally
+the backend emits code, typically just one VM instruction, for each
+continuation in the function.
+
+
 @node Bytecode
 @subsection Bytecode
 
address@hidden File Format}.
+As mentioned before, Guile compiles all code to bytecode, and that
+bytecode is contained in ELF images.  @xref{Object File Format}, for
+more on Guile's use of ELF.
+
+To produce a bytecode image, Guile provides an assembler and a linker.
+
+The assembler, defined in the @code{(system vm assembler)} module, has a
+relatively straightforward imperative interface.  It provides a
address@hidden function to instantiate an assembler and a set of
address@hidden@var{inst}} procedures to emit instructions of each kind.
+
+The @address@hidden procedures are actually generated at
+compile-time from a machine-readable description of the VM.  With a few
+exceptions for certain operand types, each operand of an emit procedure
+corresponds to an operand of the corresponding instruction.
+
+Consider @code{vector-length}, from @pxref{Miscellaneous Instructions}.
+It is documented as:
+
address@hidden Instruction {} vector-length u12:@var{dst} u12:@var{src}
address@hidden deftypefn
+
+Therefore the emit procedure has the form:
+
address@hidden {Scheme Procedure} emit-vector-length asm dst src
address@hidden deffn
+
+All emit procedure take the assembler as their first argument, and
+return no useful values.
+
+The argument types depend on the operand types.  @xref{Instruction Set}.
+Most are integers within a restricted range, though labels are generally
+expressed as opaque symbols.
+
+There are a few macro-instructions as well.
+
address@hidden {Scheme Procedure} emit-label asm label
+Define a label at the current program point.
address@hidden deffn
+
address@hidden {Scheme Procedure} emit-source asm source
+Associate @var{source} with the current program point.
address@hidden deffn
+
address@hidden {Scheme Procedure} emit-cache-current-module! asm module scope
address@hidden {Scheme Procedure} emit-cached-toplevel-box asm dst scope sym 
bound?
address@hidden {Scheme Procedure} emit-cached-module-box asm dst module-name 
sym public? bound?
+Macro-instructions to implement caching of top-level variables.  The
+first takes the current module, in the slot @var{module}, and associates
+it with a cache location identified by @var{scope}.  The second takes a
address@hidden, and resolves the variable.  @xref{Top-Level Environment
+Instructions}.  The last does not need a cached module, rather taking
+the module name directly.
address@hidden deffn
 
-TODO: document (system vm loader)
address@hidden {Scheme Procedure} emit-load-constant asm dst constant
+Load the Scheme datum @var{constant} into @var{dst}.
address@hidden deffn
+
address@hidden {Scheme Procedure} emit-begin-program asm label properties
address@hidden {Scheme Procedure} emit-end-program asm
+Delimit the bounds of a procedure, with the given @var{label} and the
+metadata @var{properties}.
address@hidden deffn
+
address@hidden {Scheme Procedure} emit-load-static-procedure asm dst label
+Load a procedure with the given @var{label} into local @var{dst}.  This
+macro-instruction should only be used with procedures without free
+variables -- procedures that are not closures.
address@hidden deffn
+
address@hidden {Scheme Procedure} emit-begin-standard-arity asm req nlocals 
alternate
address@hidden {Scheme Procedure} emit-begin-opt-arity asm req opt rest nlocals 
alternate
address@hidden {Scheme Procedure} emit-begin-kw-arity asm req opt rest 
kw-indices allow-other-keys? nlocals alternate
address@hidden {Scheme Procedure} emit-end-arity asm
+Delimit a clause of a procedure.
address@hidden deffn
+
address@hidden {Scheme Procedure} emit-br-if-symbol asm slot invert? label
address@hidden {Scheme Procedure} emit-br-if-variable asm slot invert? label
address@hidden {Scheme Procedure} emit-br-if-vector asm slot invert? label
address@hidden {Scheme Procedure} emit-br-if-string asm slot invert? label
address@hidden {Scheme Procedure} emit-br-if-bytevector asm slot invert? label
address@hidden {Scheme Procedure} emit-br-if-bitvector asm slot invert? label
+TC7-specific test-and-branch instructions.  The TC7 is a 7-bit code that
+is part of a heap object's type.  @xref{The SCM Type in Guile}.  Also,
address@hidden Instructions}.
address@hidden deffn
+
+The linker is a complicated beast.  Hackers interested in how it works
+would do well do read Ian Lance Taylor's series of articles on linkers.
+Searching the internet should find them easily.  From the user's
+perspective, there is only one knob to control: whether the resulting
+image will be written out to a file or not.  If the user passes
address@hidden:to-file? #t} as part of the compiler options (@pxref{The Scheme
+Compiler}), the linker will align the resulting segments on page
+boundaries, and otherwise not.
+
address@hidden {Scheme Procedure} link-assembly asm #:page-aligned?=#t
+Link an ELF image, and return the bytevector.  If @var{page-aligned?} is
+true, Guile will align the segments with different permissions on
+page-sized boundaries, in order to maximize code sharing between
+different processes.  Otherwise, padding is minimized, to minimize
+address space consumption.
address@hidden deffn
+
+To write an image to disk, just use @code{put-bytevector} from
address@hidden(ice-9 binary-ports)}.
+
+Compiling object code to the fake language, @code{value}, is performed
+via loading objcode into a program, then executing that thunk with
+respect to the compilation environment. Normally the environment
+propagates through the compiler transparently, but users may specify the
+compilation environment manually as well, as a module.  Procedures to
+load images can be found in the @code{(system vm loader)} module:
+
address@hidden
+(use-modules (system vm loader))
address@hidden lisp
 
 @deffn {Scheme Variable} load-thunk-from-file file
 @deffnx {C Function} scm_load_thunk_from_file (file)
 Load object code from a file named @var{file}. The file will be mapped
 into memory via @code{mmap}, so this is a very fast operation.
address@hidden deffn
 
-On disk, object code is embedded in ELF, a flexible container format
-created for use in UNIX systems.  Guile has its own ELF linker and
-loader, so it uses the ELF format on all systems.
address@hidden {Scheme Variable} load-thunk-from-memory bv
address@hidden {C Function} scm_load_thunk_from_memory (bv)
+Load object code from a bytevector.  The data will be copied out of the
+bytevector in order to ensure proper alignment of embedded Scheme
+values.
 @end deffn
 
-TODO: document load-thunk-from-memory
+Additionally there are procedures to find the ELF image for a given
+pointer, or to list all mapped ELF images:
 
-Compiling object code to the fake language, @code{value}, is performed
-via loading objcode into a program, then executing that thunk with
-respect to the compilation environment. Normally the environment
-propagates through the compiler transparently, but users may specify
-the compilation environment manually as well, as a module.
address@hidden {Scheme Variable} find-mapped-elf-image ptr
+Given the integer value @var{ptr}, find and return the ELF image that
+contains that pointer, as a bytevector.  If no image is found, return
address@hidden  This routine is mostly used by debuggers and other
+introspective tools.
address@hidden deffn
+
address@hidden {Scheme Variable} all-mapped-elf-images
+Return all mapped ELF images, as a list of bytevectors.
address@hidden deffn
 
 
 @node Writing New High-Level Languages


hooks/post-receive
-- 
GNU Guile
[Prev in Thread]
Current Thread
[Next in Thread]
[Guile-commits] GNU Guile branch, master, updated. v2.1.0-605-g69dc826, Andy Wingo <=
Prev by Date: [Guile-commits] Success: Hydra job gnu:guile-2-0:build_without_threads.x86_64-linux
Next by Date: [Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.9-159-gd278dac
Previous by thread: [Guile-commits] Success: Hydra job gnu:guile-2-0:build_without_threads.x86_64-linux
Next by thread: [Guile-commits] GNU Guile branch, stable-2.0, updated. v2.0.9-159-gd278dac
Index(es):
- Date
- Thread