emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: correct way to do state machines in elisp?


From: Jonathan Walther
Subject: Re: correct way to do state machines in elisp?
Date: Sat, 24 Aug 2002 12:34:40 -0700
User-agent: Mutt/1.4i

Thank you!  The enclosed explanation is very helpful.

Jonathan

On Sat, Aug 24, 2002 at 09:29:12PM +0200, Oliver Scholz wrote:
It seems that seems that my follow-up did not get through to the
mailing-list. So I send it to you privately.

Jonathan Walther <address@hidden> writes:
[...]
As I was writing it, I couldn't help thinking, there must be a way to
write the same thing in LISP, with 1/3 the lines of code.  The largest
guzzler of lines of code I think is the state machine that parses every
RFC compliant IRC message into a suitable structure.

Can anyone give me hints about how the same thing could be done better
in elisp?  I'm reluctant to use regular expressions for this.
[...]

I once asked for advice on state machines in comp.lang.lisp and got a
very helpful answer by Paul Foley. Now I asked for and got his
permission to forward his mail. The first part explains the basics of
state machines for a clueless non-programmer without CS-background
like me. But later on there are some examples in Lisp (CL, actually).

I hope this helps.


To: Oliver Scholz <address@hidden>
Subject: Re: [newbie] state machines
Content-Type: text/plain; charset=US-ASCII
From: Paul Foley <address@hidden>
Date: 12 Jun 2002 03:23:28 +1200

On Tue, 11 Jun 2002 14:16:23 +0200, Oliver Scholz wrote:

I found state machines sometimes mentioned in usenet articles or on
the web and I understand that there are different kinds (?) of them:
finite and infinite, deterministic and non-deterministic.  I don't
really understand this issue, but I want to learn more about this.

A state machine is a "machine" (usually simulated by a computer
program) that has "states".  It takes some input, figures out what
state it should go into next, based on the input(s), and goes into
that state, then repeats.

A deterministic state machine is one in which the "next state"
function is deterministic -- i.e., for any given input and current
state, there's only one possible next state.

A non-deterministic state machine is one in which the "next state"
function is non-deterministic -- i.e., there's more than one choice
for next state, for at least one combination of input and current
state.

A finite state machine is one that has a finite number of states.
[Which is all of them -- there are no infinite state machines]

A very simple example is a traffic light: it has 3 states: green,
orange, and red, and a single-bit "change colour now" input.  When
it's in the "green" state, the input changes it to the "orange" state;
when it's in the "orange" state, the input changes it to the "red"
state, and when it's in the "red" state, the input changes it to the
"green" state.

A computer-related example with which you're probably familiar is the
simple regular expression [real regexps are DFSMs (deterministic
finite state machines); the things called "regexp" in languages like
Perl are not actually regexps -- they're a more powerful class of
languages].  Let's say you have the regexp "a(bx|cy)*d": that means it
matches input consisting of an "a" followed by any number of "bx" or
"cy" pairs, in any order, followed by a "d".  The state machine has 5
states: let's number them starting from 1 -- it starts off in state 1.
In state 1, an input of "a" moves it to state 2; any other input is an
error (i.e., doesn't match the regexp).  In state 2, if the input is a
"b" it moves to state 3, and if it's a "c" is moves to state 4, and if
it's a "d" it moves to state 5 (anything else is an error).  In state
3, an input of "x" moves it to state 2.  In state 4, an input of "y"
moves it to state 2.  If you reach state 5, you're done.

You can draw a picture of it like this:

                  x
            ,-----------.
            |           |
            |    b      |
            |  ,-->[3]--'
        a   v / d
 -->[1]--->[2]---->[[5]]
            ^ \
            |  `-->[4]--.
            |    c      |
            |      y    |
            `-----------'

[n] represents a state, and the arrows represent transitions; the
letters on the arrows tell you what the input has to be for that
transition to trigger.  [[n]] represents a "final" state (there can
possibly be more than one, but for a regexp you can always draw it
with only one final state)

A NDFSM (non-deterministic ...) has some states that look like

        x
      ,--->[B]
     /
  [A]
     \  x
      `--->[C]

with the same letter on two or more transitions -- you can't tell
which one is the 'right one' [if you take the top one, to state B, and
later hit an "error" condition (where there's no suitable transition
and you're not in a final state), you have to back up and try the
lower one, because it might have led you to a final state without
error]

My notion is that state machines are useful whenever I have to parse
input from a stream. I don't know if this statement is correct. My

They're useful for everything you do on a computer -- a computer is
really just a finite state machine (with a very large number of
potential states).

I want to learn a) the (fundamentals of) the concept, b) when to use
them, c) how to construct them in a proper way and d) how to implement
them in Lisp.

There are several ways to implement them.  For reasonably small
numbers of states, you can write a TAGBODY with the states as tags,
using GO to change states.  Or you can use a variable for the state
number and a CASE statement dispatching on that variable for the code,
or you can write the states and transitions as data, and have a
function that walks over that data -- that's better for a large number
of states...

For example, the "a(bx|cy)*d" machine could be written

 (defun match-expr-1 (stream &aux input (state 1))
   (loop
     (setq input (read-char stream nil nil))
     (case state
       (1 (ecase input
            (#\a (setq state 2)))
       (2 (ecase input
            (#\b (setq state 3))
            (#\c (setq state 4))
            (#\d (return t))))   ; i.e., state 5 -- done
       (3 (ecase input
            (#\x (setq state 2))))
       (4 (ecase input
            (#\y (setq state 2))))))))

or

 (defstruct state
   number
   transitions
   final-p)

 (defstruct transition
   input
   next-state)

 (defun run-fsm (stream state)
   (loop
     (if (state-final-p state)
         (return t)
         (let ((transition (find (read-char stream nil nil)
                                 (state-transitions state)
                                 :key #'transition-input)))
           (if (null transition)
               (return nil)
               (setq state (transition-next-state transition)))))))


 (defun add-transition (input from-state to-state)
   (push (make-transition :input input :next-state to-state)
         (state-transitions from-state)))

 (defun match-expr-2 (stream)
   (let ((state1 (make-state :number 1))
         (state2 (make-state :number 2))
         (state3 (make-state :number 3))
         (state4 (make-state :number 4))
         (state5 (make-state :number 5 :final-p t)))
     (add-transition #\a state1 state2)
     (add-transition #\b state2 state3)
     (add-transition #\c state2 state4)
     (add-transition #\d state2 state5)
     (add-transition #\x state3 state2)
     (add-transition #\y state4 state2)
     (run-fsm stream state1)))

or whatever [of course, there's no need to rebuild all the states
every time, in the last example]

A similar solution -- and the usual way of implementing it in languages
like C -- is to use a table.  Assume a, b, c, d, x and y are the only
valid inputs, for simplicity; then a table like

State  "a"   "b"   "c"  "d"  "x"  "y"
  1     2    NIL   NIL  NIL  NIL  NIL
  2    NIL    3     4    T   NIL  NIL
  3    NIL   NIL   NIL  NIL   2   NIL
  4    NIL   NIL   NIL  NIL  NIL   2

describes the machine (a number means go to that state, T and NIL mean
return T or NIL, respectively), so you could write

 (defun read-input (stream)
   (ecase (read-char stream nil nil)
     (#\a 0) (#\b 1) (#\c 2) (#\d 3) (#\x 4) (#\y 5)))

 (defun run-fsm2 (stream table &aux (state 0))
   (loop
     (let ((value (aref table state (read-input stream))))
       (if (integerp value)
           (setq state value)
           (return value)))))

 (defun match-expr-3 (stream)
   (run-fsm2 #2A(( 1  nil nil nil nil nil)
                 (nil  2   3   t  nil nil)
                 (nil nil nil nil  1  nil)
                 (nil nil nil nil nil  1))))

[Note the renumbering of states to count from 0 instead of 1]

Does that help?

--
Oh dear god.  In case you weren't aware, "ad hominem" is not latin for
"the user of this technique is a fine debater."
                                                     -- Thomas F. Burdick
(setq reply-to
 (concatenate 'string "Paul Foley " "<mycroft" '(#\@) "actrix.gen.nz>"))




   -- Oliver

--
Oliver Scholz               7 Fructidor an 210 de la R?volution
Taunusstr. 25               Libert?, Egalit?, Fraternit?!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org


--
                    Geek House Productions, Ltd.

 Providing Unix & Internet Contracting and Consulting,
 QA Testing, Technical Documentation, Systems Design & Implementation,
 General Programming, E-commerce, Web & Mail Services since 1998

Phone:   604-435-1205
Email:   address@hidden
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC  V5R2W2

Attachment: pgpBNmmp_QiPD.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]