[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Forcing Bison's yacc.c to report more expected tokens on syntax erro
From: |
Christian Schoenebeck |
Subject: |
Re: Forcing Bison's yacc.c to report more expected tokens on syntax error |
Date: |
Thu, 07 Feb 2019 14:28:43 +0100 |
User-agent: |
KMail |
On Mittwoch, 6. Februar 2019 11:53:42 CET Derek Clegg wrote:
> In C++, we could be fancier: have two functions, one for a “pure” syntax
> error:
>
> void my_yysyntax_error();
>
> and one for syntax errors involving unexpected tokens:
>
> void my_yysyntax_error(std::string unexpected_token,
> std::vector<std::string> expected_tokens);
>
> The call would be something like this:
>
> if (yycount == 0) {
> my_yysyntax_error();
> } else {
> my_yysyntax_error(yyarg[0], std::vector<std::string>{yyarg + 1, yyarg
> + yycount}); }
That might be too limited for practical use. IMO a built-in API for retrieving
a list of expected tokens should be designed in reentrant way, such that you
could a) pass a parser state to the function to get a list of expected tokens
precisely for the passed parser state and b) conveniently duplicate a parser
state if required for being able to c) advance the duplicated parser state if
required without touching the "official" parser state.
For features like human readable syntax errors and user friendly auto
completion, I am doing that already with additional code that accesses the
Bison internal tables directly. It does work, but it involves a lot of extra
code that is injected into an auto generated parser and that solution is very
dependent on Bison's internal data structure layout and precise parser
behaviour. So it would be great if there was some built-in alternative in
Bison one day to get rid of this.
When I first started to write custom syntax error code and auto completion
features years ago, I also thought it was sufficient to just resolve all
possible next tokens, but quickly realized that it is often not user friendly
enough.
For instance in practice the parser often might be at a point in the grammar
where the only possibility would be a single specific sequence of tokens coming
next, In such a case a user would desire not just to see the very next token,
but all subsequent tokens as well, up to the point where the grammar really
has a branch and requires an actual user input decision.
Another example from my experience are parsers which are not using a separate
lexer at all, so a parser which always simply just reads the next input byte
and uses grammar rules instead for what's typically handled by external lexer
rules. In such a scenario, if you were limited to only resolve the very next
token, you would only be able to show the user a list of single character(s)
expected next, like "expected either 'a', 'g', 'c'" instead of showing the
user e.g. "expected either 'all', 'get', clear'".
Best regards,
Christian Schoenebeck