[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: improving error message
From: |
Hans Åberg |
Subject: |
Re: improving error message |
Date: |
Sat, 10 Nov 2018 14:37:09 +0100 |
> On 10 Nov 2018, at 12:50, Akim Demaille <address@hidden> wrote:
>
>> Le 10 nov. 2018 à 10:38, Hans Åberg <address@hidden> a écrit :
>>
>>> Also, see if using %param does not already
>>> give you what you need to pass information from the scanner to the
>>> parser’s yyerror.
>>
>> How would that get into the yyerror function?
>
> In C, arguments of %parse-param are passed to yyerror. That’s why I mentioned
> %param, not %lex-param. And in the C++ case, these are members.
Actually, I was thinking about the token error. But for the yyerror function, I
use C++, and compute the string for data in the semantic value, the prototype
is:
void yyparser::error(const location_type& loc, const std::string& errstr)
Then I use it for both errors and warnings, the latter we discussed long ago.
For errors:
throw syntax_error(@x, str); // Suitably computed string
For warnings:
parser::error(@y, "warning: " + str); // Suitably computed string
Then the error function above has:
std::string s = "error: ";
if (errstr.substr(0, 7) == "warning")
s.clear();
This way, the string beginning with "error: " is not shown in the case of a
warning.
>>>>> I believe that the right approach is rather the one we have in compilers
>>>>> and in bison: caret errors.
>>>>>
>>>>> $ cat /tmp/foo.y
>>>>> %token FOO 0xff 0xff
>>>>> %%
>>>>> exp:;
>>>>> $ LC_ALL=C bison /tmp/foo.y
>>>>> /tmp/foo.y:1.17-20: error: syntax error, unexpected integer
>>>>> %token FOO 0xff 0xff
>>>>> ^^^^
>>>>> I would have been bothered by « unexpected 255 ».
>>>>
>>>> Currently, that’s for those still using only ASCII.
>>>
>>> No, it’s not, it works with UTF-8. Bison’s count of characters is mostly
>>> correct. I’m talking about Bison’s own location, used to parse grammars,
>>> which is improved compared to what we ship in generated parsers.
>>
>> Ah. I thought of errors for the generated parser only. Then I only report
>> byte count, but using character count will probably not help much for caret
>> errors, as they vary in width. Then problem is that caret errors use two
>> lines which are hard to synchronize in Unicode. So perhaps some kind of one
>> line markup instead might do the trick.
>
> Two things:
>
> One is that the semantics of Bison’s location’s column is not specified:
> it is up the user to track characters or bytes. As a matter of fact, Bison
> is hardly concerned by this choice; rather it’s the scanner that has to
> deal with that.
>
> The other one is: once you have the location, you can decide how to display
> it. In the case of Bison, I think the caret errors are fine, but you
> could decide to do something different, say use colors or delimiters, to
> be robust to varying width.
Yes, actually I though about the token errors. But it is interesting to see
what you say about it.
>>>> I am using Unicode characters and LC_CTYPE=UTF-8, so it will not display
>>>> properly. In fact, I am using special code to even write out Unicode
>>>> characters in the error strings, since Bison assumes all strings are
>>>> ASCII, the bytes with the high bit set being translated into escape
>>>> sequences.
>>>
>>> Yes, I’m aware of this issue, and we have to address it.
>>
>> For what I could see, the function that converts it to escapes is sometimes
>> applied once and sometimes twice, relying on that it is an idempotent.
>
> It’s a bit more tricky than this. I’m looking into it, and I’d like
> to address this in 3.3.
I realized one needs to know a lot about Bison's innards to fix this. A thing
that made me curios is why the function it uses zeroes out the high bit: It
looks like having something with the POSIX C locale, but I could not find
anything require it to be set to zero in that locale.
Right now, I use a function that translates the escape sequences back to bytes.
>>> We also have to provide support for internationalization of
>>> the token names.
>>
>> Personally, I don't have any need for that. I use strings, like
>> %token logical_not_key "¬"
>> %token logical_and_key "∧"
>> %token logical_or_key "∨"
>> and in the case there are names, they typically match what the lexer
>> identifies.
>
> Yes, not all the strings should be translated. I was thinking of
> something like
>
> %token NUM _("number")
> %token ID _("identifier")
> %token PLUS "+"
>
> This way, we can even point xgettext to looking at the grammar file
> rather than the generated parser.
It might be good if one wants error messages in another language.
- Re: bison for nlp, (continued)
- Re: bison for nlp, r0ller, 2018/11/08
- Re: bison for nlp, Hans Åberg, 2018/11/08
- Re: bison for nlp, r0ller, 2018/11/08
- Re: bison for nlp, Akim Demaille, 2018/11/08
- Re: bison for nlp, Hans Åberg, 2018/11/09
- Re: bison for nlp, Akim Demaille, 2018/11/09
- Re: bison for nlp, Hans Åberg, 2018/11/09
- improving error message (was: bison for nlp), Akim Demaille, 2018/11/10
- Re: improving error message (was: bison for nlp), Hans Åberg, 2018/11/10
- Re: improving error message (was: bison for nlp), Akim Demaille, 2018/11/10
- Re: improving error message,
Hans Åberg <=
- Re: bison for nlp, Akim Demaille, 2018/11/09
- Re: bison for nlp, r0ller, 2018/11/12
- Re: bison for nlp, r0ller, 2018/11/19
- Re: bison for nlp, Akim Demaille, 2018/11/20
- Re: bison for nlp, r0ller, 2018/11/21
- Re: bison for nlp, Akim Demaille, 2018/11/23
- Re: bison for nlp, r0ller, 2018/11/27
- Re: bison for nlp, Akim Demaille, 2018/11/27
- Re: bison for nlp, r0ller, 2018/11/27