help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Strange behavior


From: Laurent Deniau
Subject: Re: Strange behavior
Date: Fri, 30 Apr 2004 17:22:18 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031107 Debian/1.5-3

Hans Aberg wrote:
At 09:41 +0200 2004/03/30, Laurent Deniau wrote:
>> I think your question was answered by Tim Van Holder: Due to limitation in
 >> lookahead in the LALR(1) parser generator algorithm, one has to fiddle
 >> around with the grammar to find a form that works.
 >
 >Well, I was maybe not very clear in my question but basically I was
 >asking why two equivalent rules are not considered as equivalent by
 >bison. In both cases, no more lookahead is needed, but it is true that
 >more states are created in the second case. But from my point of view it
 >is a completely different point. By the way, this happened only for one
 >optional rule over more than 20 used and if I try to reproduce it for a
 >smaller grammar, the problem vanishes. And looking at the verbose output
 >shows clearly that the conflits are not justified, and the behavior is
 >correct since the optional rules are standing alone. Otherwise I would
 >have post my question to help-bison, not bug-bison.

The thing is that Bison is using LALR(1), which is a cutdown of LR(1) by
achieved compressing certain states. It may fail when LR(1) succeeds (there
are examples of this in the Bison manual, sec. 5.7, "Mysterious
reduce/reduce conflicts"). Also, when an error occurs, LALR(1) may enter a
few more actions relative LR(1) before the error is detected (but it will
not scan for more tokens). In addition (in EBNF), a* b* and empty | a+ | b+
| a+ b+ are equivalent used in grammars, but not when actions are added (as
in Bison).

If you do not get more clarifications in the Bison lists, you may try your
question in the newsgroup comp.compilers, where there are more people good
at working grammars by hand.

 >>>>This is not really a Bison question, but a question of implementing a
 >>>>language using the LALR(1) parsing algorithm. As your language is
 >>>>well-known, it might be worthwhile asking around in the newsgroup
>>>>comp.compilers, or in some of the C newsgroups to see if somebody already
 >>>>has done it. Some C compilers might already have it (GNU C?).
 >>>
 >>>My question has nothing to do with the C99 grammar. It was just to fix
 >>>the background, nothing more.
 >>
 >>
>> So therefore, if the implemented language is known, it can be worth to look
 >> around for tricks that may be required to implement it.
 >
 >I do not understand your point. What do you mean by the language is
 >known? The norm of the C99 provides a "descriptive" grammar of the
 >language but it is neither normative (only informative) neither it
 >works. You have to modify it deeply to make it working with a LALR(1)
 >parser because it has contexts.

Right. For every pair (language, parsing algorithm), one will have to
develop a special techniques. Since C99 has been out for a while, and
LALR(1) is quite common, it seems likely that somebody has made an
implementation. Check if GCC has a .y file (it may not though).

The problem is that GCC is not based on the C99 gammar (or variant of) and it is not compliant with the C99 grammar. Some small torture code is not parsed at all (syntax error).

 >But again, this has nothing to do with
 >my question. My question is about bison.

Bison just applies LALR(1). I do not think it is likely it is buggy there.
But try another LALR(1) parser generator and see if you get the same error
messages.

So, most likely, your question is about coping with LALR(1). The simplest
way is to fiddle around with it to get a working version, and then forget
about it.

That is what I did. The default shift/reduced behavior is correct so I just give up with the aim to get a clean grammar with no s/r. I prefer to keep a simple grammar with s/r correctly solved, this will be better/simpler for future upgrade.

 >> The strings are only exported to the parser to
>> a table used for error message printouts, which has strings in it anyway.
 >
 >But you still need to write the rule with LP and RP.

No, use the strings in the rule. This makes the grammar much more readable:

 >direct_abstract_declarator
 >       : LP abstract_declarator RP

Instead of this, write:

%token LP "("
%token RP ")"
...
%%
...
  direct_abstract_declarator:
      "(" abstract_declarator ")"

Then, except for parser error messages, "(" and ")" are only used
internally in Bison. These strings do not have to be what the lexer scans
for, as it will return RP and LP.

I do not catch completly your point.

First I use '(' and ')' which is as clear as "(" and ")" but uses character instead of string. So no problem with that specific case.

Second, you say that "(" and ")" could be used directly in a rule? I thought that the rules where described using tokens or rules (internaly treated as non terminal token) and tokens were int (or unsigned int) with specific unique id assigned by Bison. So characters like '(' would used its ascii code as its id while extra tokens would start at 258 to avoid id collision as we can see in the y.tab.h. But a string literal is a pointer that can be converted into an int by the parser and then used in a rule. This is ok since most compiler will make the string unique in the same compilation unit (unique address), but this is also why the lexer cannot return the string literal which as a different address in its compilation unit and therefore it must return a token id. In the .y file (same compilation unit), the rule is still working, but you do the bet that the string address (pointer) converted to an int does not collide with the other token id. In practice this is probably true because of the address space, but it is still dangerous. Am I wrong?

In fact, it would be better if one could separate the grammar strings from
the parser error message strings, but you can't do that yet.

Regards,

ld.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]