help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: returning two tokens


From: Hans Aberg
Subject: Re: returning two tokens
Date: Wed, 3 Dec 2003 19:11:09 +0100

At 16:50 +0100 2003/12/03, Buday Gergely wrote:
>I have to return two tokens in response to a single character. How can I
>write this nicely using flex?

I suggested sometime ago that one should be able to let the Flex lexer
split, for use with the Bison GLR parser. I think it is on the todo list,
but not something one currently can do.

The big problem is here that you do not tell why you want to have such a
split. I have encountered certain context sensitive situations, which might
in the future be handled by a lexer split, but currently must be handled by
special methods. Here is an example:

In a proof verifier system I write on, I admit for axiomatic set notation
the syntaxes
    {x| A}       -- Set of all x such that A.
    {x, y, ...}  -- Set with elements x, y, ...
where x is an identifier and A an expression (and x can also be the first
token in an expression, so it need not be immediately followed by a ",").
Here I have decided that if "x" is immediately followed by a "|" then it is
the definition of a new local name; otherwise, it is the lookup table value
of "x" that will be used.

In order to resolve this, I use the Flex context switches, called "start
conditions", together with a pipe. Whenever the lexer rescans, it first
checks if the pipe is empty. If it is not empty, the first token in it is
lopped off and returned. If the pipe is empty, scan proceed as normal.
%%
  if (current_token != 0) {
    int tok = current_token; current_token = 0; return tok; }
  ...

When the "{" is encountered, I enter a special start-condition, and set a
flag. When the next token "x" has been found, I enter it into the pipe, set
a new start-condition, and rescan, without a return, for the next token. If
the next token is "|", I know that it is the first case above that is
valid, so I can enter "|" into the pipe, set the initial start-condition,
and return "x". Otherwise, "x" should just have its lookup table value,
which should be returned after the initial start-condition has been set.

This technique requires some amount of work, but the payoff is a simpler
and more descriptive parser grammar, with a better chance of being LALR(1),
as required by Bison.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]