bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] yysyntax_error: make it manage its own memory.


From: Joel E. Denny
Subject: Re: [PATCH] yysyntax_error: make it manage its own memory.
Date: Sun, 4 Oct 2009 17:27:30 -0400 (EDT)
User-agent: Alpine 1.00 (DEB 882 2007-12-20)

On Mon, 21 Sep 2009, Joel E. Denny wrote:

> I'll fold in the differences below and push those to master.

After that and perhaps some other minor tweaks I've forgotten, the patches 
below are the result.  I pushed them to master.

> I'd also like to cherry pick to branch-2.5 because I think it would be 
> nice to release lookahead correction with IELR.  However, yysyntax_error 
> on master has already evolved away from branch-2.5, so I first need to 
> cherry pick some of your commits from master:
> 
>   eeb29 Simplify the i18n of the error messages.
>   84eed Pass the token type to yysyntax_error.

I pushed all that too.

> I'm not sure I'll find time to port lookahead correction to other 
> skeletons before the release of 2.5, so I'm not currently planning to 
> search for and cherry pick your corresponding commits for other skeletons.

As it turns out, I have several more bug fixes that I need to push before 
pushing my lookahead correction implementation.  Even though I'm not 
worried about whether lookahead correction makes it into the other 
skeletons before 2.5, I think those fixes should.  I'd like to post each 
complete fix one at a time for comments, but that means I first need to 
port them to the other skeletons.  I'll work on all that when I can find 
more time.

>From 52cea04ad36abf3ab684b88cba45d6c26dda80c9 Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Wed, 23 Sep 2009 17:37:58 -0400
Subject: [PATCH] yysyntax_error: test memory management more.

* tests/atlocal.in (NO_WERROR_CFLAGS): New cpp macro.
* tests/regression.at (parse.error=verbose and
YYSTACK_USE_ALLOCA): New test group.
(parse.error=verbose overflow): New test group that reveals an
obscure bug.  Expected fail for now.
---
 ChangeLog           |    9 ++
 tests/atlocal.in    |    4 +
 tests/regression.at |  209 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 222 insertions(+), 0 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a813d69..92cfc13 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2009-09-23  Joel E. Denny  <address@hidden>
+
+       yysyntax_error: test memory management more.
+       * tests/atlocal.in (NO_WERROR_CFLAGS): New cpp macro.
+       * tests/regression.at (parse.error=verbose and
+       YYSTACK_USE_ALLOCA): New test group.
+       (parse.error=verbose overflow): New test group that reveals an
+       obscure bug.  Expected fail for now.
+
 2009-10-04  Joel E. Denny  <address@hidden>
 
        benchmarks: use %debug consistently among grammars.
diff --git a/tests/atlocal.in b/tests/atlocal.in
index 2e46329..9264a40 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -10,6 +10,10 @@
 # We want no optimization.
 CFLAGS='@O0CFLAGS@ @WARN_CFLAGS@ @WERROR_CFLAGS@'
 
+# Sometimes a test group needs to ignore gcc warnings, so it locally
+# sets CFLAGS to this.
+NO_WERROR_CFLAGS='@O0CFLAGS@ @WARN_CFLAGS@'
+
 # We need `config.h'.
 CPPFLAGS="-I$abs_top_builddir/lib @CPPFLAGS@"
 
diff --git a/tests/regression.at b/tests/regression.at
index cfc071e..0420f4e 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1260,3 +1260,212 @@ AT_BISON_CHECK([[-o input.c -Dlr.type=ielr input.y]])
 AT_CHECK([[diff -u lalr.c ielr.c]])
 
 AT_CLEANUP
+
+
+
+## -------------------------------------------- ##
+## parse.error=verbose and YYSTACK_USE_ALLOCA.  ##
+## -------------------------------------------- ##
+
+AT_SETUP([[parse.error=verbose and YYSTACK_USE_ALLOCA]])
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <stdio.h>
+  void yyerror (char const *);
+  int yylex (void);
+  #define YYSTACK_USE_ALLOCA 1
+}
+
+%define parse.error verbose
+
+%%
+
+start: check syntax_error syntax_error ;
+
+check:
+{
+  if (128 < sizeof yymsgbuf)
+    {
+      fprintf (stderr,
+               "The initial size of yymsgbuf in yyparse has increased\n"
+               "since this test group was last updated.  As a result,\n"
+               "this test group may no longer manage to induce a\n"
+               "reallocation of the syntax error message buffer.\n"
+               "This test group must be adjusted to produce a longer\n"
+               "error message.\n");
+      YYABORT;
+    }
+}
+;
+
+// Induce a syntax error message whose total length is more than
+// sizeof yymsgbuf in yyparse.  Each token here is 64 bytes.
+syntax_error:
+  "123456789112345678921234567893123456789412345678951234567896123A"
+| "123456789112345678921234567893123456789412345678951234567896123B"
+| error 'a' 'b' 'c'
+;
+
+%%
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+yylex (void)
+{
+  /* Induce two syntax error messages (which requires full error
+     recovery by shifting 3 tokens) in order to detect any loss of the
+     reallocated buffer.  */
+  static char const *input = "abc";
+  return *input++;
+}
+
+int
+main (void)
+{
+  return yyparse ();
+}
+]])
+
+AT_BISON_CHECK([[-o input.c input.y]])
+AT_COMPILE([[input]])
+AT_PARSER_CHECK([[./input]], [[1]], [],
+[[syntax error, unexpected 'a', expecting 
123456789112345678921234567893123456789412345678951234567896123A or 
123456789112345678921234567893123456789412345678951234567896123B
+syntax error, unexpected $end, expecting 
123456789112345678921234567893123456789412345678951234567896123A or 
123456789112345678921234567893123456789412345678951234567896123B
+]])
+
+AT_CLEANUP
+
+
+
+## ------------------------------ ##
+## parse.error=verbose overflow.  ##
+## ------------------------------ ##
+
+# Imagine the case where YYSTACK_ALLOC_MAXIMUM = YYSIZE_MAXIMUM and an
+# invocation of yysyntax_error has caused yymsg_alloc to grow to exactly
+# YYSTACK_ALLOC_MAXIMUM (perhaps because the normal doubling of size had
+# to be clipped to YYSTACK_ALLOC_MAXIMUM).  Now imagine a subsequent
+# invocation of yysyntax_error that overflows during its size
+# calculation and thus returns YYSIZE_MAXIMUM to yyparse.  Then, yyparse
+# will invoke yyerror using the old contents of yymsg.  This bug needs
+# to be fixed.
+
+AT_SETUP([[parse.error=verbose overflow]])
+
+AT_XFAIL_IF([[:]])
+
+AT_DATA_GRAMMAR([input.y],
+[[%code {
+  #include <stdio.h>
+  void yyerror (char const *);
+  int yylex (void);
+
+  /* This prevents this test case from having to induce error messages
+     large enough to overflow size_t.  */
+  #define YYSIZE_T unsigned char
+
+  /* Bring in malloc so yacc.c doesn't try to provide a malloc prototype
+     using our YYSIZE_T.  */
+  #include <stdlib.h>
+
+  /* Max depth is usually much smaller than YYSTACK_ALLOC_MAXIMUM, and
+     we don't want gcc to warn everywhere this constant would be too big
+     to make sense for our YYSIZE_T.  */
+  #define YYMAXDEPTH 100
+}
+
+%define parse.error verbose
+
+%%
+
+start: syntax_error1 check syntax_error2 ;
+
+// Induce a syntax error message whose total length causes yymsg in
+// yyparse to be reallocated to size YYSTACK_ALLOC_MAXIMUM, which
+// should be 255.  Each token here is 64 bytes.
+syntax_error1:
+  "123456789112345678921234567893123456789412345678951234567896123A"
+| "123456789112345678921234567893123456789412345678951234567896123B"
+| "123456789112345678921234567893123456789412345678951234567896123C"
+| error 'a' 'b' 'c'
+;
+
+check:
+{
+  if (yymsg_alloc != YYSTACK_ALLOC_MAXIMUM
+      || YYSTACK_ALLOC_MAXIMUM != YYSIZE_MAXIMUM
+      || YYSIZE_MAXIMUM != 255)
+    {
+      fprintf (stderr,
+               "The assumptions of this test group are no longer\n"
+               "valid, so it may no longer catch the error it was\n"
+               "designed to catch.  Specifically, the following\n"
+               "values should all be 255:\n\n");
+      fprintf (stderr, "  yymsg_alloc = %d\n", yymsg_alloc);
+      fprintf (stderr, "  YYSTACK_ALLOC_MAXIMUM = %d\n",
+               YYSTACK_ALLOC_MAXIMUM);
+      fprintf (stderr, "  YYSIZE_MAXIMUM = %d\n", YYSIZE_MAXIMUM);
+      YYABORT;
+    }
+}
+;
+
+// Now overflow.
+syntax_error2:
+  "123456789112345678921234567893123456789412345678951234567896123A"
+| "123456789112345678921234567893123456789412345678951234567896123B"
+| "123456789112345678921234567893123456789412345678951234567896123C"
+| "123456789112345678921234567893123456789412345678951234567896123D"
+| "123456789112345678921234567893123456789412345678951234567896123E"
+;
+
+%%
+
+void
+yyerror (char const *msg)
+{
+  fprintf (stderr, "%s\n", msg);
+}
+
+int
+yylex (void)
+{
+  /* Induce two syntax error messages (which requires full error
+     recovery by shifting 3 tokens).  */
+  static char const *input = "abc";
+  return *input++;
+}
+
+int
+main (void)
+{
+  /* Push parsers throw away the message buffer between tokens, so skip
+     this test under maintainer-push-check.  */
+  if (YYPUSH)
+    return 77;
+  return yyparse ();
+}
+]])
+
+AT_BISON_CHECK([[-o input.c input.y]])
+
+# gcc warns about tautologies and fallacies involving comparisons for
+# unsigned char.  However, it doesn't produce these same warnings for
+# size_t and many other types when the warnings would seem to make just
+# as much sense.  We ignore the warnings.
+[CFLAGS="$NO_WERROR_CFLAGS"]
+AT_COMPILE([[input]])
+
+AT_PARSER_CHECK([[./input]], [[2]], [],
+[[syntax error, unexpected 'a', expecting 
123456789112345678921234567893123456789412345678951234567896123A or 
123456789112345678921234567893123456789412345678951234567896123B or 
123456789112345678921234567893123456789412345678951234567896123C
+syntax error
+memory exhausted
+]])
+
+AT_CLEANUP
-- 
1.5.4.3


>From 45319f1365eb8d125424f31401d9d33cc02ff4ad Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Wed, 23 Sep 2009 17:39:39 -0400
Subject: [PATCH] yysyntax_error: avoid duplicate lookahead collection.

Except when memory reallocation is required, this change
eliminates the need to invoke yysyntax_error twice and thus to
repeat the collection of lookaheads.  It also prepares for
future extensions that will make those repetitions more
expensive and that will require additional memory management in
yysyntax_error.  Finally, it fixes an obscure bug already
exercised in the test suite.
* data/yacc.c (yysyntax_error): Add arguments for message
buffer variables stored in the parser.  Instead of size, return
status similar to yyparse status but indicating success of
message creation.  Other than the actual reallocation of the
message buffer, import and clean up memory management code
from...
(yyparse, yypush_parse): ... here.
* tests/regression.at (parse.error=verbose overflow): No longer
an expected failure.
---
 ChangeLog           |   20 ++
 data/yacc.c         |  142 ++++++-----
 src/parse-gram.c    |  686 ++++++++++++++++++++++++++-------------------------
 src/parse-gram.h    |   12 +-
 tests/regression.at |   11 +-
 5 files changed, 452 insertions(+), 419 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 92cfc13..6395ddd 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,25 @@
 2009-09-23  Joel E. Denny  <address@hidden>
 
+       yysyntax_error: avoid duplicate lookahead collection.
+       Except when memory reallocation is required, this change
+       eliminates the need to invoke yysyntax_error twice and thus to
+       repeat the collection of lookaheads.  It also prepares for
+       future extensions that will make those repetitions more
+       expensive and that will require additional memory management in
+       yysyntax_error.  Finally, it fixes an obscure bug already
+       exercised in the test suite.
+       * data/yacc.c (yysyntax_error): Add arguments for message
+       buffer variables stored in the parser.  Instead of size, return
+       status similar to yyparse status but indicating success of
+       message creation.  Other than the actual reallocation of the
+       message buffer, import and clean up memory management code
+       from...
+       (yyparse, yypush_parse): ... here.
+       * tests/regression.at (parse.error=verbose overflow): No longer
+       an expected failure.
+
+2009-09-23  Joel E. Denny  <address@hidden>
+
        yysyntax_error: test memory management more.
        * tests/atlocal.in (NO_WERROR_CFLAGS): New cpp macro.
        * tests/regression.at (parse.error=verbose and
diff --git a/data/yacc.c b/data/yacc.c
index 26c5996..ea2ae14 100644
--- a/data/yacc.c
+++ b/data/yacc.c
@@ -889,26 +889,27 @@ yytnamerr (char *yyres, const char *yystr)
 }
 # endif
 
-/* Copy into YYRESULT an error message about the unexpected token
-   YYTOKEN while in state YYSTATE.  Return the number of bytes copied,
-   including the terminating null byte.  If YYRESULT is null, do not
-   copy anything; just return the number of bytes that would be
-   copied.  As a special case, return 0 if an ordinary "syntax error"
-   message will do.  Return YYSIZE_MAXIMUM if overflow occurs during
-   size calculation.  */
-static YYSIZE_T
-yysyntax_error (char *yyresult, int yystate, int yytoken)
+/* Copy into *YYMSG, which is of size *YYMSG_ALLOC, an error message
+   about the unexpected token YYTOKEN while in state YYSTATE.
+
+   Return 0 if *YYMSG was successfully written.  Return 1 if an ordinary
+   "syntax error" message will suffice instead.  Return 2 if *YYMSG is
+   not large enough to hold the message.  In the last case, also set
+   *YYMSG_ALLOC to either (a) the required number of bytes or (b) zero
+   if the required number of bytes is too large to store.  */
+static int
+yysyntax_error (YYSIZE_T *yymsg_alloc, char **yymsg,
+                int yystate, int yytoken)
 {
   int yyn = yypact[yystate];
 
   if (! (YYPACT_NINF < yyn && yyn <= YYLAST))
-    return 0;
+    return 1;
   else
     {
       YYSIZE_T yysize0 = yytnamerr (0, yytname[yytoken]);
       YYSIZE_T yysize = yysize0;
       YYSIZE_T yysize1;
-      int yysize_overflow = 0;
       enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 5 };
       /* Internationalized format string. */
       const char *yyformat = 0;
@@ -942,11 +943,17 @@ yysyntax_error (char *yyresult, int yystate, int yytoken)
              }
            yyarg[yycount++] = yytname[yyx];
            yysize1 = yysize + yytnamerr (0, yytname[yyx]);
-           yysize_overflow |= (yysize1 < yysize);
+           if (! (yysize <= yysize1
+                  && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
+             {
+               /* Overflow.  */
+               *yymsg_alloc = 0;
+               return 2;
+             }
            yysize = yysize1;
          }
 
-        switch (yycount)
+      switch (yycount)
         {
 #define YYCASE_(N, S)                           \
           case N:                               \
@@ -961,32 +968,42 @@ yysyntax_error (char *yyresult, int yystate, int yytoken)
         }
 
       yysize1 = yysize + yystrlen (yyformat);
-      yysize_overflow |= (yysize1 < yysize);
+      if (! (yysize <= yysize1 && yysize1 <= YYSTACK_ALLOC_MAXIMUM))
+        {
+          /* Overflow.  */
+          *yymsg_alloc = 0;
+          return 2;
+        }
       yysize = yysize1;
 
-      if (yysize_overflow)
-       return YYSIZE_MAXIMUM;
+      if (*yymsg_alloc < yysize)
+        {
+          *yymsg_alloc = 2 * yysize;
+          if (! (yysize <= *yymsg_alloc
+                 && *yymsg_alloc <= YYSTACK_ALLOC_MAXIMUM))
+            *yymsg_alloc = YYSTACK_ALLOC_MAXIMUM;
+          return 2;
+        }
 
-      if (yyresult)
-       {
-         /* Avoid sprintf, as that infringes on the user's name space.
-            Don't have undefined behavior even if the translation
-            produced a string with the wrong number of "%s"s.  */
-         char *yyp = yyresult;
-         int yyi = 0;
-         while ((*yyp = *yyformat) != '\0')
-            if (*yyp == '%' && yyformat[1] == 's' && yyi < yycount)
-              {
-                yyp += yytnamerr (yyp, yyarg[yyi++]);
-                yyformat += 2;
-              }
-            else
-              {
-                yyp++;
-                yyformat++;
-              }
-       }
-      return yysize;
+      /* Avoid sprintf, as that infringes on the user's name space.
+         Don't have undefined behavior even if the translation
+         produced a string with the wrong number of "%s"s.  */
+      {
+        char *yyp = *yymsg;
+        int yyi = 0;
+        while ((*yyp = *yyformat) != '\0')
+          if (*yyp == '%' && yyformat[1] == 's' && yyi < yycount)
+            {
+              yyp += yytnamerr (yyp, yyarg[yyi++]);
+              yyformat += 2;
+            }
+          else
+            {
+              yyp++;
+              yyformat++;
+            }
+      }
+      return 0;
     }
 }
 #endif /* YYERROR_VERBOSE */
@@ -1431,37 +1448,28 @@ yyerrlab:
 #if ! YYERROR_VERBOSE
       yyerror (]b4_yyerror_args[YY_("syntax error"));
 #else
-      {
-       YYSIZE_T yysize = yysyntax_error (0, yystate, yytoken);
-       if (yymsg_alloc < yysize && yymsg_alloc < YYSTACK_ALLOC_MAXIMUM)
-         {
-           YYSIZE_T yyalloc = 2 * yysize;
-           if (! (yysize <= yyalloc && yyalloc <= YYSTACK_ALLOC_MAXIMUM))
-             yyalloc = YYSTACK_ALLOC_MAXIMUM;
-           if (yymsg != yymsgbuf)
-             YYSTACK_FREE (yymsg);
-           yymsg = (char *) YYSTACK_ALLOC (yyalloc);
-           if (yymsg)
-             yymsg_alloc = yyalloc;
-           else
-             {
-               yymsg = yymsgbuf;
-               yymsg_alloc = sizeof yymsgbuf;
-             }
-         }
-
-       if (0 < yysize && yysize <= yymsg_alloc)
-         {
-           (void) yysyntax_error (yymsg, yystate, yytoken);
-           yyerror (]b4_yyerror_args[yymsg);
-         }
-       else
-         {
-           yyerror (]b4_yyerror_args[YY_("syntax error"));
-           if (yysize != 0)
-             goto yyexhaustedlab;
-         }
-      }
+      while (1)
+        {
+          int yysyntax_error_status =
+            yysyntax_error (&yymsg_alloc, &yymsg, yystate, yytoken);
+          if (yysyntax_error_status == 2 && 0 < yymsg_alloc)
+            {
+              if (yymsg != yymsgbuf)
+                YYSTACK_FREE (yymsg);
+              yymsg = (char *) YYSTACK_ALLOC (yymsg_alloc);
+              if (yymsg)
+                continue;
+              yymsg = yymsgbuf;
+              yymsg_alloc = sizeof yymsgbuf;
+            }
+          if (yysyntax_error_status == 0)
+            yyerror (]b4_yyerror_args[yymsg);
+          else
+            yyerror (]b4_yyerror_args[YY_("syntax error"));
+          if (yysyntax_error_status == 2)
+            goto yyexhaustedlab;
+          break;
+        }
 #endif
     }
 
diff --git a/tests/regression.at b/tests/regression.at
index 0420f4e..3857916 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1350,16 +1350,13 @@ AT_CLEANUP
 # Imagine the case where YYSTACK_ALLOC_MAXIMUM = YYSIZE_MAXIMUM and an
 # invocation of yysyntax_error has caused yymsg_alloc to grow to exactly
 # YYSTACK_ALLOC_MAXIMUM (perhaps because the normal doubling of size had
-# to be clipped to YYSTACK_ALLOC_MAXIMUM).  Now imagine a subsequent
-# invocation of yysyntax_error that overflows during its size
-# calculation and thus returns YYSIZE_MAXIMUM to yyparse.  Then, yyparse
-# will invoke yyerror using the old contents of yymsg.  This bug needs
-# to be fixed.
+# to be clipped to YYSTACK_ALLOC_MAXIMUM).  In an old version of yacc.c,
+# a subsequent invocation of yysyntax_error that overflows during its
+# size calculation would return YYSIZE_MAXIMUM to yyparse.  Then,
+# yyparse would invoke yyerror using the old contents of yymsg.
 
 AT_SETUP([[parse.error=verbose overflow]])
 
-AT_XFAIL_IF([[:]])
-
 AT_DATA_GRAMMAR([input.y],
 [[%code {
   #include <stdio.h>
-- 
1.5.4.3





reply via email to

[Prev in Thread] Current Thread [Next in Thread]