chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Chicken Gazette - Issue 5


From: Alex Shinn
Subject: [Chicken-users] Chicken Gazette - Issue 5
Date: Tue, 28 Sep 2010 00:07:41 +0900
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (darwin)

     _/_/_/  _/        _/            _/
  _/        _/_/_/          _/_/_/  _/  _/      _/_/    _/_/_/
 _/        _/    _/  _/  _/        _/_/      _/_/_/_/  _/    _/
_/        _/    _/  _/  _/        _/  _/    _/        _/    _/
 _/_/_/  _/    _/  _/    _/_/_/  _/    _/    _/_/_/  _/    _/

--[ Issue 5 ]-------------------------------------- G A Z E T T E
                               brought to you by the Chicken Team


== 0. Introduction

Welcome to issue 5 of the Chicken Gazette! We're tentatively switching the
Gazette publication to Monday this week, to give people something to read after
coming back from the weekend.

== 1. The Hatching Farm - New Eggs & The Egg Repository

This week Mario Domenech Goulart
(http://wiki.call-cc.org/users/mario-domenech-goulart) released a new egg
called accents-substitute (http://wiki.call-cc.org/eggref/4/accents-substitute)
to replaced accented Latin characters with either HTML entities or their
non-accented ASCII equivalents, for when you need to work with ASCII-only text.

Ivan Raikov (http://wiki.call-cc.org/users/ivan-raikov) has been busy,
and in addition to many egg updates has released a new egg called cis
(http://wiki.call-cc.org/egg/cis) (compact integer sets) as a possible
alternate to last week's featured egg iset (http://wiki.call-cc.org/egg/iset).
It's less efficient in terms of time and space, but has a simpler
implementation for when performance doesn't matter.

Our fearless leader Felix (http://wiki.call-cc.org/users/felix-winkelmann)
also added a new egg system (http://wiki.call-cc.org/eggref/4/system), inspired
by the CL defsystem macro. `system` lets you define groups of files and their
dependencies which can be loaded or compiled, and even re-loaded or compiled
keeping track of modified files. Use it for a `make` alternative in your .setup
files, or for rapid development in the repl!

== 2. The Core - Bleeding Edge Development

It's been another busy week for core development:

Overflow-detection for basic arithmetic ops (`+`, `-`, `*` and `/`) has been
changed to use bit-twiddling instead of "parallel flonum" computations, since
64-bit IEEE doubles have not enough precision to hold the full range of fixnums
on 64-bit systems

A serious compiler bug related to inlining was fixed (found with much help by
Sven Hartrumpf (http://wiki.call-cc.org/users/sven-hartrumpf)), and several
other bugs reported by Kon Lovett (http://wiki.call-cc.org/users/kon-lovett)
were fixed.

A new foreign type `pointer-vectors` (vectors of native unboxed pointers) was
added, with an API in lolevel.

A simpler alternative to `er-macro-transformer`, `ir-macro-transformer`
(implicit renaming macros) was added by Peter Bex
(http://wiki.call-cc.org/users/peter-bex). See ticket #394 on trac.

But the biggest change: irregex is now the official regex API, and has full
library unit status, regex unit is removed and available as an egg (should
be fully backwards compatible, as long as `(use regex) (import irregex)`
idiom is used; dependencies on regex unit not in egg repo, though), upgraded
irregex version, many upstream bug-fixes and optimizations, with many
thanks to Peter Bex (http://wiki.call-cc.org/users/peter-bex) and Alex Shinn
(http://wiki.call-cc.org/users/alex-shinn).

And thanks to Felix for help with the summary!

== 3. Chicken Talk

The exciting news on the mailing list this week was a performance boost
(http://lists.nongnu.org/archive/html/chicken-users/2010-09/msg00074.html)
mentioned by Mario Domenech Goulart
(http://wiki.call-cc.org/users/mario-domenech-goulart) where the awful
(http://wiki.call-cc.org/eggref/4/awful) web framework ran a benchmark almost
7x faster. This is likely due to a new GC improvement by Felix.

Taylor Venable brought up an issue
(http://lists.nongnu.org/archive/html/chicken-users/2010-09/msg00068.html) in
the new coops (http://wiki.call-cc.org/eggref/4/coops) object system involving
class redefinition and `define-method` on a generic not first provided with
`define-generic`. It turns out Chicken will do an implicit `define-generic`
for you as a convenience, but it's probably best to define each generic once
explicitly. Also be aware that redefining a class will create a completely new
class which instances of the old class will not belong to.

Richard Hollos reported an error
(http://lists.nongnu.org/archive/html/chicken-users/2010-09/msg00066.html)
compiling on AMD64 Linux, which turned out to be just a chicken and egg
problem.

Markus Klotzbuecher provided a patch
(http://lists.nongnu.org/archive/html/chicken-users/2010-09/msg00075.html) for
the `cairo` egg, showing activity in the graphical library front.

Finally Felix just announced
(http://lists.nongnu.org/archive/html/chicken-users/2010-09/msg00083.html)
coops (http://wiki.call-cc.org/eggref/4/coops) version 1.0! If you've been
using `tinyclos`, give `coops` a try.

== 4. Omelette Recipes - Tips and Tricks

We've got a wide variety of options for parsing in Chicken, from the built-in
`read` and extensions thereon, to regular expressions, to a plethora of
both specific and general purpose parsing libraries. This week, I want
to take a look at one of the general purpose libraries, the packrat egg
(http://wiki.call-cc.org/egg/packrat) by Tony Garnock-Jones.

A packrat parser is a parser for Parsing Expression Grammars (PEGs), which is
essentially a recursive decent parser with backtracking plus memoization. What
does all that mean? Well, a recursive decent parser just looks ahead a fixed
number of tokens and dispatches accordingly. Scheme's own s-expressions are a
simple example of this, since a datum can be determined by just the first few
characters. Since you just determine the rule once in advance and then recurse,
this lends itself to very simple parsers, and indeed `read` is very easy to
implement in Scheme.

Adding backtracking removes the restriction on fixing the rule in advance,
allowing more complex grammars resembling those of typical parser generators.
With backtracking we usually call the library a parser combinator, and
there are already a number of examples available including Taylor Campbell's
parscheme (http://mumble.net/~campbell/darcs/parscheme/). The disadvantage of a
naive backtracking approach is that it may require exponential time to parse. A
packrat parser just adds memoization, which guarantees a linear time parse.

The rules for PEGs look similar to Context Free Grammars (CFGs), but are
unambiguous by virtue of being ordered - the leftmost matching rule is always
chosen. PEGs also allow `and` and `not` rules, which allow them to parse
languages that can't be described by CFGs.

The documentation for packrat
(http://bugs.call-cc.org/export/20226/release/4/packrat/doc/packrat.pdf) is not
too user-friendly - among other things doesn't include any examples of parsing
from actual text. This is because `packrat` is written to work over abstract
streams of tokens, not just text, but text is usually what people want to
work with when looking at parsers. So we'll start by writing a version of the
expression parser example to work on text.

      ;; Start with the base textual generator from the documentation:
      (define (generator filename port)
        (let ((ateof #f)
              (pos (top-parse-position filename)))
          (lambda ()
            (if ateof
                (values pos #f)
                (let ((x (read-char port)))
                  (if (eof-object? x)
                      (begin
                        (set! ateof #t)
                        (values pos #f))
                      (let ((old-pos pos))
                        (set! pos (update-parse-position pos x))
                        (values old-pos (cons x x)))))))))

You can just use this as-is without worrying about the details for now.

      ;; Now define a character-oriented version of the expression parser:
      (define expr-parser
        (packrat-parser expr
          (expr ((a <- mulexp '#\+ b <- mulexp) (+ a b)) ((a <- mulexp) a))
          (mulexp ((a <- simple '#\* b <- simple) (* a b)) ((a <- simple) a))
          (simple ((a <- num) a) (('#\( a <- expr '#\)) a))
          (num ((a <- digits) (string->number (list->string a))))
          (digits ((a <- digit b <- digits) (cons a b)) ((a <- digit) (list a)))
          (digit ((a <- '#\0) a) ((a <- '#\1) a) ((a <- '#\2) a)
                 ((a <- '#\3) a) ((a <- '#\4) a) ((a <- '#\5) a)
                 ((a <- '#\6) a) ((a <- '#\7) a) ((a <- '#\8) a)
                 ((a <- '#\9) a))))

Here we describe our grammar as returning an `expr` (the initial rule).
Following this is a list of definitions for the non-terminals `expr`, `mulexp`,
`simple`, `num`, `digits` and `digit`, which may recursively refer to each
other.

Each definition has one or more clauses which contain a sequence and an action.
The action is normal Scheme code. The sequence is a list of other non-terminals
referred to by name, and non-terminals referred to by quoting them. You can
also bind a value to be bound in the action by writing `<id> <- <pattern>`.

So the simplest (though longest) rule is for `digit` which simply matches any
digit character, binds is, and returns it. These are collected by the `digits`
rule which conses a list of one or more digits. Notice that `digits` first
tries to bind more than one digit by recursively matching itself, defaulting to
just a single digit only when this fails - the rules are tried in left-to-right
order, only backtracking on failure. Finally, `num` takes the result of
`digits` and constructs a number from it.

The other rules are from the original documentation, and are fairly
straightforward parser rules. We've chosen to compute the values directly,
using `(+ a b)` and `(* a b)`, but could just have easily returned the results
as s-expressions, e.g. with ``(+ ,a ,b)` giving the user an AST to work with.

Now to put together the parser and generator, let's write a simple utility
function:

      ;; Utility function to parse from strings or ports,
      ;; and return the semantic results on success or #f on failure:
      (define (expr-parse x)
        (let ((g (base-generator->results
                  (generator #f (if (string? x) (open-input-string x) x)))))
          (let ((res (expr-parser g)))
            (and (parse-result-successful? res)
                 (parse-result-semantic-value res)))))
    
      ;; Some examples:
      (expr-parse "2") => 2
      (expr-parse "2+2") => 4
      (expr-parse "2+2*7") => 16
      (expr-parse "(2+2)*7") => 28
      (expr-parse "3*4+5*6") => 42

We pass a dummy filename and a port to the generator, and pass the result
of this to `base-generator->results`, then call the parser on it. The parser
results are available in `parse-result-semantic-value`.

Note there's no need for a separate lexer, as the rules are matching directly
on characters. You can use a separate lexer if you want, though, by providing a
higher-level generator than the one above.

For a more complex example see the packrat alternative in the `uri-generic`
egg.

The egg is still rather primitive, and is missing some useful features such
as charsets (which would simplify our "digit" rule greatly), and arbitrary
predicates, but is definitely something that can be built on. It would be
especially nice if rules could be composed - maybe someone can add that!

== 5. About the Chicken Gazette

The Gazette is produced weekly by a volunteer from the Chicken community.
The latest issue can be found at http://gazette.call-cc.org or you can
follow it in your feed reader at http://gazette.call-cc.org/feed.atom.
If you'd like to write an issue, check out the instructions
(http://bugs.call-cc.org/browser/gazette/README.txt) and come and find us in
#chicken on Freenode!

[ --- End of this issue --- ]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]