help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improving yysyntax_error()


From: Hans Aberg
Subject: Re: improving yysyntax_error()
Date: Thu, 21 Jun 2007 16:08:16 +0200

It happened on 21 Jun 2007, at 15:43, that Christian Schoenebeck wrote:

Es geschah am Thursday, 21. June 2007 13:12 als Hans Aberg schrieb:
On the one hand, you try to use Bison for something it wasn't
designed for, so unless to can come up with good motivations, getting
the change is unlikely to happen.

Depends on which motivation you mean. A motivation for doing the lexer task on bison side? Or a motivation just for the suggested new declaration keyword?

Whatever changes you want to come true.

Remember that the development is
done by volunteers that do what they want.

I know, I work on open source projects as well. Nevertheless I want to
implement this.

Then it is probably more appropriate for the Bug-Bison and Bison- Patches lists.

In case it'll be accepted for a future release of bison,
fine, if not I'll keep it for my own use and for all others who might want to use it as well, because I'm quite sure I'm not the only one using bison without an external lexer. And in these cases you need something like the
suggested new feature, to get useful error messages.

On the other hand, one way to extend Flex & Bison to Unicode is to
use UTF-8, and let Flex return a sequence of characters. Then this
can be used in Bison, too, if 'c_1...c_k' expands to 'c_1'...'c_k'.
Then the problem is to generate the full UTF-8-character in error
messages, not just the leading byte. I do not know how to implement
this, though.

Right, this would be the exact same application, that is moving the lexer task on bison side and yylex() only to return the next byte from the input stream,
that is simply byte by byte.

No, in this model, the lexer matches patterns as usual, only when the match has been made, returns the multibyte character byte by byte.

For supporting UTF-8 characters, you could
define "atomic" grammar rules, like:

UTF8_CAPITAL_PI : '\316' '\240' ;

UTF8_CAPITAL_OMEGA : '\316' '\231' ;

and use the suggested new declaration keyword "%atomic" like:

%atomic UTF8_CAPITAL_PI UTF8_CAPITAL_OMEGA

to tell bison the right hand side of those UTF-8 character rules (that is their byte sequene) is too trivial / unteresting to be shown in errors and
yysyntax_error() would i.e return:

"syntax error, unexpected 'UTF8_CAPITAL_OMEGA', expecting 'UTF8_CAPITAL_PI"

instead of:

"syntax error, unexpected '\231', expecting '\240"

The latter would be completely useless and confusing for regular users.

The stuff above isn't needed, except for the generation of error messages. So how do you intend to implement your %atomic construct?

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]