[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] token value 0 is ignored
From: |
Akim Demaille |
Subject: |
Re: [PATCH] token value 0 is ignored |
Date: |
29 Oct 2001 17:01:46 +0100 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Artificial Intelligence) |
>>>>> "Dick" == Dick Streefland <address@hidden> writes:
Dick> I tried to build openmotif, using "bison -y" instead of yacc,
Dick> but one of the tools (uil) didn't work. The problem is that the
Dick> grammar file Uil.y contains the following token definition:
Dick> %token UILEOF 0
Dick> Bison ignores the 0, and assigns token value 257.
Well I agree it should not renumber it, at least not silently, but
your patch is wrong, you seem to have forgotten the calling convention
with yylex, which the author of the %tokenline above didn't, according
to the name of the token:
Calling Convention for `yylex'
------------------------------
The value that `yylex' returns must be the numeric code for the
type of token it has just found, or 0 for end-of-input.
When a token is referred to in the grammar rules by a name, that name
in the parser file becomes a C macro whose definition is the proper
numeric code for that token type. So `yylex' can use the name to
indicate that type. *Note Symbols::.
When a token is referred to in the grammar rules by a character
literal, the numeric code for that character is also the code for the
token type. So `yylex' can simply return that character code. The
null character must not be used this way, because its code is zero and
that is what signifies end-of-input.
Here is an example showing these things:
yylex ()
{
...
if (c == EOF) /* Detect end of file. */
return 0;
...
if (c == '+' || c == '-')
return c; /* Assume token type for `+' is '+'. */
...
return INT; /* Return the type of the token. */
...
}
This interface has been designed so that the output from the `lex'
utility can be used without change as the definition of `yylex'.
If the grammar uses literal string tokens, there are two ways that
`yylex' can determine the token type codes for them:
* If the grammar defines symbolic token names as aliases for the
literal string tokens, `yylex' can use these symbolic names like
all others. In this case, the use of the literal string tokens in
the grammar file has no effect on `yylex'.
* `yylex' can find the multi-character token in the `yytnamé table.
The index of the token in the table is the token typés code.
The name of a multi-character token is recorded in `yytnamé with a
double-quote, the token's characters, and another double-quote.
The token's characters are not escaped in any way; they appear
verbatim in the contents of the string in the table.
Herés code for looking up a token in `yytnamé, assuming that the
characters of the token are stored in `token_buffer'.
for (i = 0; i < YYNTOKENS; i++)
{
if (yytname[i] != 0
&& yytname[i][0] == '"'
&& strncmp (yytname[i] + 1, token_buffer,
strlen (token_buffer))
&& yytname[i][strlen (token_buffer) + 1] == '"'
&& yytname[i][strlen (token_buffer) + 2] == 0)
break;
}
The `yytnamé table is generated only if you use the
`%token_tablé declaration. *Note Decl Summary::.