[Chicken-users] Chicken Gazette

chicken-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-users] Chicken Gazette - Issue 15

From:	Moritz Heidkamp
Subject:	[Chicken-users] Chicken Gazette - Issue 15
Date:	Tue, 07 Dec 2010 15:15:02 +0100
     _/_/_/  _/        _/            _/
  _/        _/_/_/          _/_/_/  _/  _/      _/_/    _/_/_/
 _/        _/    _/  _/  _/        _/_/      _/_/_/_/  _/    _/
_/        _/    _/  _/  _/        _/  _/    _/        _/    _/
 _/_/_/  _/    _/  _/    _/_/_/  _/    _/    _/_/_/  _/    _/

--[ Issue 15 ]------------------------------------- G A Z E T T E
                               brought to you by the Chicken Team


== 0. Introduction

Welcome to issue 15 of the Chicken Gazette.

== 1. The Hatching Farm

  * In his everlasting quest to provide the best documentation tooling  
    of all, Jim Ursetto created yet another chicken-doc spin-off.       
    Luckily it is not called `yacdso`. Please welcome manual-labor!     
  * Ivan Raikov went on and fixed an edge case problem in the npdiff    
    egg, leading to the release of version 1.14. While he was at it he  
    also released version 1.11 of format-textdiff fixing some bugs in   
    the context diff handler.                                           
  * Peter Lane updated his dataset-utils and classifiers eggs. If       
    machine learning or data mining is your thing you might want to     
    check those out!                                                    
  * Felix Winkelmann released version 0.2 of the handy                  
    progress-indicators extension.                                      
  * Lastly, our daring Peter Bex wrestled with openssl once again. The  
    new release 1.5 should hopefully fix the remaining issues we had    
    with the Chicken Wiki.                   

== 2. Core development

Chicken's `master` branch has seen two bug fix commits this week.  One
of them fixed a problem with `thread-sleep!` which failed when it was
passed an inexact/flonum sleep-time. Thanks to Karel Miklav for
reporting this! The other one fixed a bug which caused all kinds of
trouble when `cons` got redefined. This was noticed by David Steiner
when he tried following the code examples in SICP using Chicken.

The `experimental` branch has been busy experimenting: In the aftermath
of the jbogenturfa'i debugging session (see below), Felix decided
to remove the lambda-lifter entirely since he figured it was rather
ineffective at lifting lambdas. The scrutinizer on the other hand got
some dents flattened and is here to stay. Finally, the aforementioned
manual-labor is the official (experimental) manual generator now!

== 3. Chicken Talk

Readers of issue 11 may remember the T-DOSE picture gallery
(http://www.call-cc.org/pictures/t-dose2010/index.html). Be informed
that it now contains a few additional pictures, some of which are
quite okay!

The chicken-users list was dominated by Alan Post's ongoing effort to
create a Lojban parser this week, spanning across two threads. In the
first of them he details his progress on optimizing the code's
compilation while in the second he reports on problems he encountered
when running it for the first time. As it turned out, they were caused
by using `equal?` on a list containing circular data resulting in an
endless recursion and finally in a stack overflow. A change request is
imminent to ameliorate this situation.

Christian Kellermann kindly announced the all new Gazette authorship
organisation mode.  Check it out if you are interested in contributing
or helping with a future Gazette issue!

== 4. Omelette Recipes

Today we're going to look at the amb egg. This egg implements the
`amb` operator, much-loved as an example of the use of continuations
to alter control flow in exciting ways, unusual evaluation orders, and
a mind-altering image of a world where computers exploit parallel
universes or quantum mechanics to explore multiple branches of a
recursion.

However, `amb` is sometimes a useful tool; there's been a few cases
where I've wished it was available in projects I've done in other
languages, and I'm going to simplify one of them into an example.

But first, let's look at what `amb` does (and a little about how it
works). Basically, `amb` is a form (it can be a macro or a procedure,
and the difference in effect is immaterial for our purposes in this
recipe) that has a variable number of arguments and, in principle,
returns them all in separate "threads"; it can be thought of as a bit
like POSIX's `fork()`, except lightweight. However, the intent is not
to exploit parallelism (most implementations of `amb` do not provide
any kind of pre-emption; each "thread" runs until it terminates, then
lets another run), but to explore different possible control flows. As
such, there is an `amb-assert` form that, if its argument is #f, kills
the current "thread" and runs another.

So every time your program performs an `amb`, multiple threads pop into
existence; and whenever it performs an `amb-assert`, some threads are
culled. The principle is, whenever you have a point in your program
where a choice must be made, you can use `amb` to have the program
split up and explore every possible choice; and when it turns out,
further down the line, to have been the wrong choice, you can kill it,
freeing the CPU to explore another choice.

This is usually demonstrated with a program that solves a logic puzzle
of the form "Peter, Jim, Moritz and Felix are hacking on the Chicken
core. Peter is working only on source files written in Scheme. Moritz
works only on `posix.scm`, whoever works on the expander also works on
the compiler, you can't work on the scheduler without also working on
`csi`, ...and so on... So, who introduced bug #232?". Which is all well
and good for an undergraduate programming exercise, but who encounters
such problems in real life?

Well, the general class of problem is a "tree search problem". The
idea is you have some situation that can be modelled as a tree of
possibilities (potentially infinite!), with solutions dotted around the
tree; and our goal is to find perhaps all the solutions, perhaps just
any solution, or perhaps the solution nearest to the root of the tree.
Practical examples include finding a record in a B-Tree (where the
tree structure actually corresponds to physical index blocks on disk),
solving logic puzzles (where the tree structure is a purely logical
tree of possible situations, some of which may yield solutions), or
choosing the best move to make in a game (where the tree represents
possible states of the game, and the moves that can move between
states).

These kinds of algorithms are normally written as recursive functions,
which take whatever represents a position on the tree as an argument,
check to see if there's a solution there (and handle it appropriately
if so, which may involve terminating the search if we've found enough
solutions), then work out all the positions below this one in the
tree and recursively call the function on each in turn. This forces
a "depth-first" search, as the recursion will only bottom out if a
solution is found (that terminates the search entirely) or a subtree is
totally exhausted of options. Going for a "breadth-first" search, where
every child of the current position is checked for solutions before
starting to recurse down into the children of the children, is a little
harder to implement; rather than just writing a recursive function
that explores the tree, one must keep track of a queue of subtrees that
will need exploring in future, pushing newly-found subtrees onto the
back of the queue rather than exploring them as soon as they're found.
However, breadth-first searches will find the solutions nearest to the
top of the tree first, and will not flounder if they encounter infinite
subtrees with no solutions in, so they are often more attractive than
depth-first.

However, rather than writing a depth-first recursive search function,
or a breadth-first search function using a queue, you can also
implement these search problems as a simple function using `amb`. One
benefit of this is that one can swap the `amb` implementation used
from depth-first to breadth-first to choose the best search strategy,
without changing your program - but that's a side benefit. The real
benefit is, of course, clearer, more maintainable code - which makes
the `amb` egg worth its memory usage in leaked diplomatic cables!

Let's look at a real example from my sordid past. I once wrote an
app (in C, alas) to help people navigate the UK's complex welfare
system, which basically lets people apply to receive money back
from the government if they meet certain criteria. There's a lot of
different benefits, each of which with different eligibility rules,
and complex rules to work out how much benefit you're entitled to
based on your circumstances. It's a nightmare, and exactly the sort of
thing computers are good at. So I wrote a program to work it out. Now,
working out eligibility for a benefit, and how much can be claimed,
involves asking the user a lot of questions, which is tiresome. Many
of these questions are shared between different benefits (or different
branches of the complex flowchart you have to follow to work out some
of the hairier benefits), and also, many of these questions need only
be asked if certain paths are explored (questions about your child
need only be asked if you're looking into benefits that are meant
to help with the costs of raising children - which are only worth
exploring at all if you actually have children; and the ones relating
to pregnancy, well, there's no point in asking about any of them if
the answer to "What is your biological gender?" was "Male".) We want
to ask the minimum number of questions we can - so we shouldn't ask
the same question twice, and we shouldn't ask questions unless we need
the answers. The first problem can be solved by keeping a database
of answers, and only asking the question if the desired information
isn't already in the database; the second problem has to be solved by
organising the control flow of the process to ask questions that can
eliminate entire branches of enquiry up-front. Which implies a tree
search!

As it happens, in C, I wrote a horrible nest of tangled-looking code
that basically followed all the rules to work out what benefits you
were entitled to. It worked, but it was hard to maintain - and the
benefits rules change frequently. Sometimes the order of questions
being asked or calculations being performed mattered (you needed to
ask things up front to eliminate possibilities) and sometimes it was
irrelevant, as I had to put them in some order, and only my comments
in the code made it clear which was which. A big part of the problem
was that I had to arrange the computation around the asking of the
questions; this was fine when we could ask a single question that
would choose the path to take (if the client is male, don't ask about
pregnancy), but sometimes we had to work out several benefits that
were available, working out each case in turn (which was complex, as
claiming some benefits altered your eligibility for others) to see
which one would work out best for the client - rejecting any they
turned out to not be eligible for. The resulting code was messy indeed.

But enough prose - let's get to an example. With some code!

I can't remember the details of the UK welfare system as of the
late 1990s, and it was incredibly tedious anyway. So we'll work on a
fictitious over-simplified example, to get to the heart of the matter
easily.

Let's say we have these benefits available:

  * Low Income Credit, which is payable to people who earn less than    
    £10,000 a year, and whose partner (if the have one) earns less     
    than £15,000 a year. You get given £(30,000 - total income of     
    person/couple) / 10 per year, split into monthly payments.          
  * Housing Benefit, which is payable to people or couples who earn     
    less than £20,000 a year between them, or £15,000 if they have    
    children, and who live in rented accomodation, and are not claiming 
    Low Income Credit. You get £2,500 a year, in monthly payments, if  
    you have children or earn less than £15,000; £2,000 a year if you 
    do not have children and earn more than £15,000.                   
  * Carer's Allowance, which is payable to people whose partners are    
    so ill that they need help with basic household tasks. The partner  
    must not be earning any income for this to be claimed. It's a       
    flat £1,000 a year. However, being an "allowance" rather than a    
    "credit" or a "benefit", it counts as income so may affect other    
    benefits.                                                           
  * Family Credit, which is available to the parent of a child (as long 
    as the other parent is not also claiming it for the same child).    
    It's a flat £1,000 a year, unless you (and your partner, if you    
    have one) together earn more than £30,000 a year.                  

Now, on to the code. To avoid asking the same question more than once,
we can keep a global alist of asked questions:

     (use srfi-1)
    
     (define *asked-questions* '())
    
     (define (ask question)
       (let ((existing (assoc question *asked-questions*)))
         (if existing
         (cdr existing)
         (begin
           (write question) (newline)
           (let ((answer (read-line)))
             (set! *asked-questions*
                   (cons (cons question answer)
                         *asked-questions*))
             answer)))))

As we use an actual global variable, this state will be preserved even
if we backtrack with `amb`.

We can wrap that in a few functions that ask useful questions:

     (define (income allowances)
       (+ allowances (string->number
          (ask "What is your income, not including any allowances? Enter 0 for 
none"))))
    
     (define (has-partner?)
       (string=? (ask "Do you have a partner?") "yes"))
    
     (define (partner-is-ill?)
       (if (has-partner?)
           (string=? (ask "Does your partner need help around the house?") 
"yes")
           #f))
    
     (define (renting?)
       (string=? (ask "Do you rent your home?") "yes"))
    
     (define (partner-income)
       (if (has-partner?)
           (string->number
        (ask "What is your partner's income, including any allowances? Enter 0 
for none"))
           0))
    
     (define (family-income allowances)
       (+ (income allowances) (partner-income)))
    
     (define (num-children)
       (string->number (ask "How many children do you have?")))

Now, clearly, "effective income" is the actual earnings of the person
or couple, plus any "allowances" they receive, and what other benefits
are being claimed may affect the computation of a benefit. So we will
phrase our functions to work out the benefits as having an argument
which is a record giving details of what else they're claiming. We can
then write our basic benefit calculators as fairly direct translations
of the rules, using `amb-assert` from the amb egg to reject any invalid
benefits:

     (use amb)
    
     (define-record benefits
       claimed allowances)
    
     (define (claim-benefit existing-benefits benefit amount allowance?)
       (make-benefits
        (cons (cons benefit amount)
          (benefits-claimed existing-benefits))
        (if allowance?
        (+ amount (benefits-allowances existing-benefits))
        (benefits-allowances existing-benefits))))
    
     (define (claiming? existing-benefits benefit)
       (assq benefit (benefits-claimed existing-benefits)))
    
     (define (compute-lic other-benefits)
       (amb-assert (< (income (benefits-allowances other-benefits)) 10000))
       (amb-assert (not (claiming? other-benefits 'hb)))
       (if (has-partner?)
           (amb-assert (< (partner-income) 15000)))
       (claim-benefit
        other-benefits
        'lic (/ (- 30000 (family-income (benefits-allowances other-benefits))) 
10) #f))
    
     (define (compute-hb other-benefits)
       (if (zero? (num-children))
           (amb-assert (< (family-income (benefits-allowances other-benefits)) 
20000))
           (amb-assert (< (family-income (benefits-allowances other-benefits)) 
25000)))
       (amb-assert (renting?))
       (amb-assert (not (claiming? other-benefits 'lic)))
       (claim-benefit
        other-benefits
        'hb (if (and (zero? (num-children)) (> (family-income) 15000))
            2000
            2500) #f))
    
     (define (compute-ca other-benefits)
       (amb-assert (zero? (partner-income)))
       (amb-assert (partner-is-ill?))
       (claim-benefit
        other-benefits
        'ca 1000 #t))
    
     (define (compute-fc other-benefits)
       (amb-assert (not (zero? (num-children))))
       (amb-assert (< (family-income (benefits-allowances other-benefits)) 
30000))
       (claim-benefit
        other-benefits
        'fc 1000 #f))

Having done that, we can try out all the possibilities using `amb` to
decide whether we try to claim each benefit or not:

     (define (compute-benefits)
       (let* ((initial-state (make-benefits '() 0))
          ;; Compute allowances
          (claim-ca (amb #t #f))
          (after-ca (if claim-ca (compute-ca initial-state) initial-state))
          ;; Compute other benefits
          (claim-fc (amb #t #f))
          (after-fc (if claim-fc (compute-fc after-ca) after-ca))
          (claim-hb (amb #t #f))
          (after-hb (if claim-hb (compute-hb after-fc) after-fc))
          (claim-lic (amb #t #f))
          (after-lic (if claim-lic (compute-lic after-hb) after-hb)))
         after-lic))

What does `compute-benefits` actually do? For each benefit, it "splits
the world" using `amb` into two versions - one where we try and claim
this benefit, and one where we don't. We compute Carer's Allowance
first, as it is an allowance which might affect later calculations;
we can't compute any income-dependent benefits until we've decided if
we're claiming this one or not, so it has to go first. The order of the
others was picked arbitrarily.

`compute-benefits` then returns the final state of affairs as a
benefits record, but it will return several times with different
possible final states. We need to find them all, and pick the one with
the highest benefits earned. The way to do that is with `amb-collect`,
which takes an expression and makes a list of all the results that
expression returns in different threads:

     (define (total-benefits benefits)
       (fold (lambda (claim sum-so-far)
           (+ (cdr claim) sum-so-far))
         0
         (benefits-claimed benefits)))
    
     (define (best-benefits-to-claim)
       (let ((options
          (sort
           (amb-collect (compute-benefits))
           (lambda (a b)
             (< (total-benefits b)
                (total-benefits a))))))
         (display "Your best option is to claim the following: ")
         (write (benefits-claimed (car options)))
         (newline)))

Running `(best-benefits-to-claim)` will ask you some questions (if it's
the first time you've run it, at any rate) and then suggest how much to
claim of which benefits:

    #;22> (best-benefits-to-claim)
    "Do you have a partner?"
    #;22> no
    "How many children do you have?"
    #;22> 2
    "What is your income, not including any allowances? Enter 0 for none"
    #;22> 15000
    "Do you rent your home?"
    #;22> yes
    Your best option is to claim the following: ((hb . 2500) (fc . 1000))

You can try again if you clear the `*asked-questions*` store:

    #;25> (set! *asked-questions* '())
    #;26> (best-benefits-to-claim)
    "Do you have a partner?"
    #;26> yes
    "What is your partner's income, including any allowances? Enter 0 for none"
    #;26> 0
    "Does your partner need help around the house?"
    #;26> yes
    "How many children do you have?"
    #;26> 2
    "What is your income, not including any allowances? Enter 0 for none"
    #;26> 14500
    "Do you rent your home?"
    #;26> yes
    Your best option is to claim the following: ((hb . 2500) (fc . 1000) (ca . 
1000))

The resulting code is highly extensible. The entire complexities
of choosing which benefits to go for are reduced to listing
the requirements of each benefit inside its definition, using
`amb-assert`s, then stacking up the benefits in the `let*` inside
`compute-benefits`. If `compute-benefits` became much more complex,
I'd be inclined to put the benefits functions into two lists (one for
allowances, one for the rest, to ensure allowances are covered first)
and iterate through them.

This example is a bit simplified and contrived - imagine each benefit
has tens to hundreds of computation steps, many of which involve
asking a question, and many of which depend on the results of previous
calculations or assumptions; and imagine there are fifteen different
benefits of varying complexity levels. And then imagine that the rules
for each benefit change periodically, so you need to minimize the
amount of duplication of information in your formulation of the rules.

Aren't you glad to be a smug Lisp weenie?

== 5. About the Chicken Gazette

The Gazette is produced weekly by a volunteer from the Chicken
community. The latest issue can be found at http://gazette.call-cc.org
or you can follow it in your feed reader at
http://gazette.call-cc.org/feed.atom. If you'd like to write an issue,
consult http://wiki.call-cc.org/gazette for the schedule and
instructions!

[ --- End of this issue --- ]
[Prev in Thread]
Current Thread
[Next in Thread]
[Chicken-users] Chicken Gazette - Issue 15, Moritz Heidkamp <=
- Re: [Chicken-users] Chicken Gazette - Issue 15, Alan Post, 2010/12/07
Prev by Date: [Chicken-users] chicken script with many CLI arguments
Next by Date: Re: [Chicken-users] Chicken Gazette - Issue 15
Previous by thread: [Chicken-users] chicken script with many CLI arguments
Next by thread: Re: [Chicken-users] Chicken Gazette - Issue 15
Index(es):
- Date
- Thread