help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-greedy wildcard possible? (Long)


From: Hans Aberg
Subject: Re: Non-greedy wildcard possible? (Long)
Date: Fri, 21 May 2004 21:17:52 +0200

At 00:55 +0200 2004/05/21, Magnus Lie Hetland wrote:
>> So then you must tweak your LL(1) parser in order to filter out legal
>> tokens, because otherwise you would only need to feed the Bison parser with
>> the same legal tokens. What exactly kind of tweaks that you use in the
>> LL(1) setup do not work in the LALR(1) setup?
>
>As long as I only use a lookahead of 1, there should be no problem.
>Then the next token will always be the first occurrence of a legal
>token in the input. So, as I said, as long as I can get an LR(1) (or
>LALR(1)) parser to behave this way (either by implementing it myself
>or by somehow tweaking Bison/Flex) there shouldn't be a problem.

But why don't you let the Flex lexer to strip out the illegal tokens? Why
do you want Bison to look at the next token, and not merely Flex?

You should be aware of that LALR(1) is more complicated, because sometimes
the lookahead token is not used at all, if the parsing does not require it.
So therefore, Flex/Bison handshakes can be tricky, because the Bison parser
may not look for a new token when one wants it to say set context variables
for the lexer.

>Bison would work as well as any other parser if I managed to create a
>standard CFG after adding the wildcards (as described above). If I
>disallow empty rules and insist that each production is terminated by
>a token/terminal, that shouldn't be a problem. Otherwise, it probably
>wouldn't be too easy...

Since I skeptical to this wildcard thing, we have to look for different
ways around it.

>> but you say here you have found an LL(1) way to do it. What kind of
>> tweak would you need in the LARL(1) parser?
>
>(1) Only ask for one token of lookahead before deciding what to do (so
>you don't risk asking for a token that "belongs" to a different
>context/production) and (2) make the lexer return the first occurrence
>of a legal token (as defined by the lookahead set).

As I said above, the Bison parser may not look at a new token at all, so
(1) does not work directly.

But the funny thing is that a similar problem has been discussed before in
this or the Bug-Bison list: A Frank Heckenbach <address@hidden> is
implementing a Pascal compiler where one can throw in debugging commands at
will. This sounds like it is similar to your problem.

I was able to work out a description how to implement this feature, though
I will have to strain my mind to remember it (see below). The real problem
is finding someone willing to do implement that as a feature in Bison. If
it is right, perhaps you and Frank Heckenbach can do it?

>I thought this would be easy with Flex/Bison, but it seems Bison
>somehow asks for more than one lookahead token, which would foul it
>up. (This is the "easy" solution -- the "hard" version of the problem
>would be to have a full CFG-specified terminator expression, rather
>than a single token. I've abandoned that, though.)

No, the Bison parser never asks for more than one token; that's is why it
is LALR(k), where k = 1 and not > 1. But it may ask for zero tokens, which
may foul things up, if one wants to use it to extract the next legal token.

Anyway, I think the thread where a similar problem was discussed was
"Finding out when a token is consumed" in this list, between the dates
2003/04/30 and 2003/05/15. The solution might have been described in the
Bug-Bison list, thread "Featuritis: Token actions", between the dates
2003/07/03 and 2003/07/26.

The basic idea, though is that the parser, before its action switch
statement, contains a portion (in my C++ version; the C version is slightly
different):
  // Backup.
yybackup:
  // Try to make a decision without lookahead:
  n_ = pact_[state_];
  if (n_ == pact_ninf_)
    goto yydefault;

  // Read a lookahead token.
  if (lookahead_ == empty_) {
    YYCDEBUG << "Reading a token: ";
    lex_();
  }
This is the portion where the Bison parser decides whether it needs to look
for a new token. So this place is where you should tweak for special
token-lookahead features.

Once you have decided what kinds of tweaks you need, it should not be
difficult to write your own skeleton file.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]