help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improving yysyntax_error()


From: Christian Schoenebeck
Subject: Re: improving yysyntax_error()
Date: Thu, 21 Jun 2007 15:43:39 +0200
User-agent: KMail/1.9.5

Es geschah am Thursday, 21. June 2007 13:12 als Hans Aberg schrieb:
> On the one hand, you try to use Bison for something it wasn't
> designed for, so unless to can come up with good motivations, getting
> the change is unlikely to happen.

Depends on which motivation you mean. A motivation for doing the lexer task on 
bison side? Or a motivation just for the suggested new declaration keyword?

> Remember that the development is 
> done by volunteers that do what they want.

I know, I work on open source projects as well. Nevertheless I want to 
implement this. In case it'll be accepted for a future release of bison, 
fine, if not I'll keep it for my own use and for all others who might want to 
use it as well, because I'm quite sure I'm not the only one using bison 
without an external lexer. And in these cases you need something like the 
suggested new feature, to get useful error messages.

> On the other hand, one way to extend Flex & Bison to Unicode is to
> use UTF-8, and let Flex return a sequence of characters. Then this
> can be used in Bison, too, if 'c_1...c_k' expands to 'c_1'...'c_k'.
> Then the problem is to generate the full UTF-8-character in error
> messages, not just the leading byte. I do not know how to implement
> this, though.

Right, this would be the exact same application, that is moving the lexer task 
on bison side and yylex() only to return the next byte from the input stream, 
that is simply byte by byte. For supporting UTF-8 characters, you could 
define "atomic" grammar rules, like:

UTF8_CAPITAL_PI : '\316' '\240' ;

UTF8_CAPITAL_OMEGA : '\316' '\231' ;

and use the suggested new declaration keyword "%atomic" like:

%atomic UTF8_CAPITAL_PI UTF8_CAPITAL_OMEGA

to tell bison the right hand side of those UTF-8 character rules (that is 
their byte sequene) is too trivial / unteresting to be shown in errors and 
yysyntax_error() would i.e return:

"syntax error, unexpected 'UTF8_CAPITAL_OMEGA', expecting 'UTF8_CAPITAL_PI"

instead of:

"syntax error, unexpected '\231', expecting '\240"

The latter would be completely useless and confusing for regular users.

CU
Christian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]