bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: $$ = $1


From: Akim Demaille
Subject: Re: $$ = $1
Date: 31 Jul 2002 10:44:07 +0200
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Honest Recruiter)

Hi Paul,

| > Paul, the following comment and code do not agree:
| > 
| > 
| >    /*-----------------------------.
| >    | yyreduce -- Do a reduction.  |
| >    `-----------------------------*/
| >    yyreduce:
| >      /* yyn is the number of a rule to reduce with.  */
| >      yylen = yyr2[yyn];
| >    
| >      /* If YYLEN is nonzero, implement the default value of the action:
| >         `$$ = $1'.
| >    
| >         Otherwise, the following line sets YYVAL to the semantic value of
| >         the lookahead token.  This behavior is undocumented and Bison
| >         users should not rely upon it.  Assigning to YYVAL
| >         unconditionally makes the parser a bit smaller, and it avoids a
| >         GCC warning that YYVAL may be used uninitialized.  */
| >      yyval = yyvsp[1-yylen];
| > 
| > 
| > The comment refers to rules without any lhs, i.e., yylen = 0,
| 
| The first part of the comment ("If YYLEN is nonzero...") refers to
| rules with nonempty right-hand sides.
| 
| The second part of the comment ("Otherwise, ...") refers to rules with
| empty right-hand sides.  I think this is what you're talking about
| when you say "without any lhs".

Yes, definitely.


| > i.e., we point to yyvsp[1], in other words one past the end of the
| > stack: a no-bit-land.
| 
| It shouldn't be a no-bit-land, since yyvsp[1] should contain the
| semantic value of the lookahead token that was used to decide whether
| to make this reduction.

Sorry, but *why* do think we should store the lookahead on the
stack???  The lookahead is in yychar, yylval, and yylloc.  It is not
on the stack.


| If memory serves, Bison won't reduce an empty rule without a lookahead
| token.  

It sure does, but I'm not even sure I can see in what way this
sentence would explain anything if it did.

Look for instance this test case:

   %{
   #include <stdio.h>
   #include <stdlib.h>
   
   const char *input;
   #define YYINITDEPTH 4
   int
   yyerror (const char *msg)
   {
     fprintf (stderr, "%s\n", msg);
     exit (1);
   }
   
   %}
   
   // %glr-parser
   %debug
   %union {
     int ival;
   }
   %type <ival> exp one empty
   %%
   exp: one empty { $$ = $1; };
   one: '1'       { $$ = 12345;} ;
   empty:         {};
   %%
   int
   yylex (void)
   {
     static int count = 0;
     yylval.ival = count;
     return input[count++];
   }
   
   
   int
   main (int argc, const char *argv[])
   {
     yydebug = !!getenv ("YYDEBUG");
     input = "11";
     return yyparse ();
   }


| /tmp % bison foo.y && gcc foo.tab.c -Wall -ggdb && YYDEBUG=1 ./a.out
| foo.tab.c: In function `yyparse':                            
| foo.tab.c:732: warning: implicit declaration of function `yylex'
| Starting parse
| Entering state 0
| Reading a token: Next token is token '1' ()
| Shifting token 49 ('1'), Entering state 1
| Reducing via rule 2 (line 24), '1'  -> one
| state stack now 0
| Entering state 3
| Reducing via rule 3 (line 25),  -> empty

Rule 3 is reduced without even having a lookahead: it is read below,
even after the reduction of rule 1.

| state stack now 0 3
| Entering state 5
| Reducing via rule 1 (line 23), one empty  -> exp
| state stack now 0
| Entering state 2
| Reading a token: Next token is token '1' ()
| parse error



| I very vaguely recall checking for this back in 1999 when I
| proposed the above-quoted code (which was installed on 2000-10-02).
| Perhaps things have changed since then?  Or quite possibly I didn't
| check correctly.  FYI, before the patch, the last line of the above
| code was `if (yylen > 0) yyval = yyvsp[1-yylen];', which meant that
| yyval could be used "uninitialized".

I very well remember this.  The question is whether you prefer to
stick to the comment and implement

        if (yylen > 0) 
          yyval = yyvsp[1-yylen];
        else
          yyval = yylval;

in which case the comment cannot pretend that it is `_the lookahead',
since it is certainly the previous one, or keep the code as is, which
just works around GCC's complaint about unitialized variable by taking
advantage of its impossibility of complaining about uninitialized
*slots* of arrays.




| > I face the same problem for the default location, which when there is
| > a non empty lhs, amounts to
| > 
| >         @$ = range from @1 to @n
| > 
| > What does make sense in that case, is to
| > have a 0-width location starting where the last symbol ended (@0), and
| > ends at the same point.
| 
| More useful, I think, would be to have a nonpositive-width location,
| starting where the next symbol starts, and ending where the previous
| symbol ends.  

Good call!  That's definitely what we'd want!  But then again, we have
the problem that we don't read the next lookahead ``soon enough''.

BYacc does the same:

yydebug: state 0, reading 49 ('1')
yydebug: state 0, shifting to state 1
yydebug: state 1, reducing by rule 2 (one : '1')
yydebug: after reduction, shifting from state 0 to state 3
yydebug: state 3, reducing by rule 3 (empty :)             <====
yydebug: after reduction, shifting from state 3 to state 4
yydebug: state 4, reducing by rule 1 (exp : one empty)
yydebug: after reduction, shifting from state 0 to state 2
yydebug: state 2, reading 49 ('1')                         <====

But why do you want it to be negative?

| This would let the user identify exactly where the empty
| rule was reduced, and it should be a natural generalization of the
| existing code, which already says that @$ starts with @1's start and
| ends with @n's end (in this case, n is 0, so @$ ends where @0 ends).
| 
| If you don't like negative widths, I suppose you could arbitrarily
| set those widths to zero.

Actually, I'm not sure I really understood what you meant.  My reading
was that this emptiness is certainly ``full´´ of comments or
whitespaces, skipped by the scanner.  My reading was that you wanted
to get that space.  Or maybe we don't apply `width' to the same thing:
I was referring to the number of characters a location covers, the
difference between its end, and its start: this is nonnegative.

So I'd say something like

        @$ starts where @0 ends, and ends where yylloc starts

if yylloc was read :(





| > Anyway, back to our $$ = $1.  I fail to see what real problem the code
| > could introduce.  Sure, we read unitialized bytes, but tools such as
| > Valgrind are fine with this *except* if you *use* this value.  But
| > then, the user is wrong anyway, since she didn't specify a value for
| > `empty'.
| 
| Can't Bison check for this case when it is processing the rule?
| Then we don't need to worry about the user being wrong, since Bison
| will reject such grammars.

I'm a bit lost.  I was saying that in a properly typed grammar, a
nterm with an empty lhs without action cannot be typed.  Since it is
not typed, its value cannot be used, so tools like Valgrind that let
uninitialized bits be moved, but won't let them be `used' are fine
with the current code.

I'm just try to see if the code is really safe.  The two problems I
see are:

1. Cf. the paragraph above,
2. Looking past the top of the stack.

If both items are checked, then we just have to fix the comment, but
keep the code.

| > The other problem might be looking past the actually
| > allocated stack, but I couldn't make a test case that exhibits this
| > failure (I'll try again tomorrow, I must go now, but maybe the limits
| > we have on the stack always ensures we can use its top + 1.
| 
| Yes, I think the code is designed that way (at least it was in 1999).

I guess it was the goal, but I really don't think it ever worked.

| Even better, the location should always be initialized.

Definitely!


| > What fix would you suggest?
| 
| How about if we have Bison reject grammars where the implicit action
| "$$=$1" is applied to a rule with an empty right-hand-side?  Perhaps
| that's too Draconian; on the other hand, such a grammar is in
| trouble anyway.

Bison does when the item is typed: we really don't have problems with
the user, we have with our parser itself.

BTW, stupid me, I should have looked before: this is what BYacc does
too:

#if YYDEBUG
    if (yydebug)
        printf("%sdebug: state %d, reducing by rule %d (%s)\n",
                YYPREFIX, yystate, yyn, yyrule[yyn]);
#endif
    yym = yylen[yyn];
    yyval = yyvsp[1-yym];                <========
    switch (yyn)
    {
case 1:
#line 21 "foo.y"
{ yyval.ival = yyvsp[-1].ival; }
break;
case 2:
#line 22 "foo.y"
{ yyval.ival = 12345;}
break;
case 3:
#line 23 "foo.y"
{}
break;
#line 266 "y.tab.c"
    }
    yyssp -= yym;
    yystate = *yyssp;
    yyvsp -= yym;
    yym = yylhs[yyn];



reply via email to

[Prev in Thread] Current Thread [Next in Thread]