help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Forcing Bison's yacc.c to report more expected tokens on syntax erro


From: Christian Schoenebeck
Subject: Re: Forcing Bison's yacc.c to report more expected tokens on syntax error
Date: Thu, 07 Feb 2019 14:28:43 +0100
User-agent: KMail

On Mittwoch, 6. Februar 2019 11:53:42 CET Derek Clegg wrote:
> In C++, we could be fancier: have two functions, one for a “pure” syntax
> error:
> 
>     void my_yysyntax_error();
> 
> and one for syntax errors involving unexpected tokens:
> 
>      void my_yysyntax_error(std::string unexpected_token,
> std::vector<std::string> expected_tokens);
> 
> The call would be something like this:
> 
>     if (yycount == 0) {
>       my_yysyntax_error();
>     } else {
>       my_yysyntax_error(yyarg[0], std::vector<std::string>{yyarg + 1, yyarg
> + yycount}); }

That might be too limited for practical use. IMO a built-in API for retrieving 
a list of expected tokens should be designed in reentrant way, such that you 
could a) pass a parser state to the function to get a list of expected tokens 
precisely for the passed parser state and b) conveniently duplicate a parser 
state if required for being able to c) advance the duplicated parser state if 
required without touching the "official" parser state.

For features like human readable syntax errors and user friendly auto 
completion, I am doing that already with additional code that accesses the 
Bison internal tables directly. It does work, but it involves a lot of extra 
code that is injected into an auto generated parser and that solution is very 
dependent on Bison's internal data structure layout and precise parser 
behaviour. So it would be great if there was some built-in alternative in 
Bison one day to get rid of this.

When I first started to write custom syntax error code and auto completion 
features years ago, I also thought it was sufficient to just resolve all 
possible next tokens, but quickly realized that it is often not user friendly 
enough.

For instance in practice the parser often might be at a point in the grammar 
where the only possibility would be a single specific sequence of tokens coming 
next, In such a case a user would desire not just to see the very next token, 
but all subsequent tokens as well, up to the point where the grammar really 
has a branch and requires an actual user input decision.

Another example from my experience are parsers which are not using a separate 
lexer at all, so a parser which always simply just reads the next input byte 
and uses grammar rules instead for what's typically handled by external lexer 
rules. In such a scenario, if you were limited to only resolve the very next 
token, you would only be able to show the user a list of single character(s) 
expected next, like "expected either 'a', 'g', 'c'" instead of showing the  
user e.g. "expected either 'all', 'get', clear'".

Best regards,
Christian Schoenebeck




reply via email to

[Prev in Thread] Current Thread [Next in Thread]