Re: RFC: enum instead of #define for tokens

help-bison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: enum instead of #define for tokens

From:	Akim Demaille
Subject:	Re: RFC: enum instead of #define for tokens
Date:	05 Apr 2002 10:53:54 +0200
User-agent:	Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp)

| > From: Akim Demaille <address@hidden>
| > Date: 04 Apr 2002 12:11:45 +0200
| > 
| > Also, under some conditions (which are pretty rare, I agree),
| > symbols and rules have to be renumbered.  This means that you need
| > to walk through all the arrays, renumbering from the old number, to
| > the new number.  Using pointers, I no longer have to do that: you
| > just change the member `number' of symbols/rules, and your done.
| 
| Hmm, does this action occur at run-time?  I thought that these arrays
| were readonly.  For example, yyr1 is readonly.

Arg.  I chose my words to be unambiguous, and I now realize they were
anyway.  When I said the `engine', I'm referring to bison(1).  I
should have said the generator, as engine could be understood as the
table driven part of the parsers.


| If you're talking about something that occurs while Bison is running,
| that's a different matter.  But if you're talking about the generated
| parser, then it can improve performance quite a bit in some cases to
| use integers rather than pointers.

I'm precisely starting by the generator so that most `short' in the
output will then be easy to reconsider.


| > The motivations for this change is (i) making the maintenance of Bison
| > easier, thanks to a minimum of type checking service from the
| > compiler
| 
| I'm afraid that I am still a bit lost here.  When you talk about
| maintaining Bison, it sounds like you're talking about the Bison
| executable itself, and here it's clearly OK to shift to a cleaner
| representation even if it's slower.  But I thought we were talking
| about the generated parser.

Yeah, I'm sorry about that.  I was referring to the generator.


| > (ii) making extensions of Bison much easier: you don't
| > need to know all the arrays that exist in there to recovered the
| > values associated to this or that guy: you have the guy, so you have
| > all you need about it.
| 
| I don't know what sort of extensions you're considering, so I'm at a
| bit of a loss here. 

I'd be happy to swallow btyacc inside Bison, I'd be happy if full
LR(1), or DRLR(1) could be implemented.  To make it easier for the
people who try, I try to have straightforward data types and
representations, instead of having logical entities split in part
scattered in different arrays indexed by shorts.


| > We already know that we have to escape from short in most places
| 
| Only for the parts of grammars that are large.  And we can even shrink
| short to char in some cases, which will make things smaller.

Again, this ambiguity coming back :)  I was referring to the
generator.  Once the generator cleaned for shorts, we can push the
changes in the output.  Changing shorts in the output is not enough,
as anyway it's a bigger output than what it can handle inside.

Back to your sentence: are you considering that Bison ought to chose
the size of the output types?  Are you saying that for small grammars
we should stick to shorts where possible, and int otherwise, instead
of imposing ints to everyone?

Also, I never quite understood the following bit, could someone
explain it to me? (src/system.h)

| /*---------------------------------.
| | Machine-dependencies for Bison.  |
| `---------------------------------*/
| 
| #ifdef        eta10
| # define      MAXSHORT        2147483647
| # define      MINSHORT        -2147483648
| #else
| # define      MAXSHORT        32767
| # define      MINSHORT        -32768
| #endif
| 
| #if defined (MSDOS) && !defined (__GO32__)
| # define MAXTABLE     16383
| #else
| # define MAXTABLE     32767
| #endif

I suppose we should use uintmax and Co., and AC_CHECK_SIZEOF.  But I
don't understand eta10, not do I understand the relationship between
being DOS or GO32 (???) and MAXTABLE (which is the size of the
*output* tables, it is not used as a maximum size of inner tables,
that's MAXSHORT's job).


(Also, part of my revamping reveals places when unsigned short would
make it).


| POSIX does say that the default value of the error token must be 256,
| and that if you use that default, the lexical analyzer should not
| return 256 and thus 256 cannot be used as a character.  

OK, thanks!

| I don't think
| this is a real problem in practice, but if it is a user can work
| around it by changing the value of the error token with a %token
| declaration.

Yep, I saw your post, but I have not tried.  As you sure it works?  I
mean, I had to fight with bison to be able to rename EOF, and I'm
surprised that you can renumber error freely.










| > Paul> At best we can warn in the documentation that it doesn't work if
| > Paul> you change encodings between the Bison run and the cc+runtime
| > Paul> runs.
| > 
| > Or we should find a means not to output the characters as
| > shorts/integer, but as the characters themselves.
| 
| To do that without loss of efficiency (on modern hosts, anyway), I
| think we'd need to use designated initializers, and we might have to
| renumber things.  E.g., instead of this:
| 
|   static const char yytranslate[] =
|   {
|        0,     2,     2,     2,     2,     2,     2,     2,     2,     2,
|       44,     2,     2,     2,     2,     2,     2,     2,     2,     2,
|        2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
|        2,     2,     2,     2,     2,     2,     2,     2,    42,     2,
|       ...
|   }
| 
| we would need something like this:
| 
|   #ifndef YYHAVE_DESIGNATED_INITIALIZERS
|   # define YYHAVE_DESIGNATED_INITIALIZERS (199901 < __STDC_VERSION__)
|   #endif
| 
|   static const char yytranslate[297] =
|   {
|   #if YYHAVE_DESIGNATED_INITIALIZERS
|        /* This works even with EBCDIC.  */  
|        [ 0 ] = 2,
|        [ '\n' ] = 44,
|        [ '&' ] = 42,
|       ...

Wow!  Yes, I forgot there were tables using the tokens are entry.
Grrr, people should not rely on this `feature'.


| > Make it a muscle, and use it in the skeleton.
| 
| OK, but I'll run it by bison-patches first.
| 
| Also, before I do that, here is a discrepancy between
| bison-1_29-branch and my private copie of the merged version of
| bison.simple.  Is this discrepancy intended?  I am concerned about the
| first two changed lines in this hunk; the others are OK I guess.
| 
| 
| --- bison-1_29-branch/src/bison.simple        2002-04-04 12:23:07.218001000 
-0800
| +++ bison.tmp/data/bison.simple       2002-04-03 10:26:17.333000000 -0800
| ...
| @@ -694,18 +910,23 @@ yyreduce:
|        int yyi;
|  
|        YYFPRINTF (stderr, "Reducing via rule %d (line %d), ",
| -              yyn, yyrline[yyn]);
| +              yyn - 1, yyrline[yyn]);
|  
|        /* Print the symbols being reduced, and their result.  */
| -      for (yyi = yyprhs[yyn]; yyrhs[yyi] > 0; yyi++)
| +      for (yyi = yyprhs[yyn]; yyrhs[yyi] >= 0; yyi++)
|       YYFPRINTF (stderr, "%s ", yytname[yyrhs[yyi]]);
|        YYFPRINTF (stderr, " -> %s\n", yytname[yyr1[yyn]]);
|      }

This is normal because in Bison 1.5x I have implemented the rule 0,
the original rule: `$axiom: <start> <eof>'.  Unfortunately, I cannot
use `0' as rule number, as sometimes rules numbers and token numbers
are put together in a single array of shorts.  To distinguish them,
rules numbers are negated.  So there would still be a clash between
`0', the token EOF, and `-0', the initial rule.

And >= 0 is normal too, as now one can have the EOF token appear
in the rules (typically, the rule 0).

[Prev in Thread]

Current Thread

[Next in Thread]

Re: RFC: enum instead of #define for tokens, (continued)
- Re: RFC: enum instead of #define for tokens, Hans Aberg, 2002/04/02

Prev by Date: Re: RFC: enum instead of #define for tokens
Next by Date: Re: RFC: enum instead of #define for tokens
Previous by thread: Re: RFC: enum instead of #define for tokens
Next by thread: Re: RFC: enum instead of #define for tokens
Index(es):
- Date
- Thread