bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Warn about character literals not of length one.


From: Joel E. Denny
Subject: Warn about character literals not of length one.
Date: Thu, 23 Jul 2009 14:22:28 -0400 (EDT)
User-agent: Alpine 1.00 (DEB 882 2007-12-20)

On at least one occasion when converting a grammar from a formal language 
specification to a Bison specification, I forgot to convert the single 
quotes around multi-character literals to double quotes.  Initially, I was 
amazed at the massive number of parser table conflicts Bison reported, and 
I was amazed at how long IELR(1) took to generate the tables as a result.  
Eventually I realized the problem: Bison thinks literals like '+' and '++' 
are the same token.  I wonder how many researchers have been led astray by 
this problem when using Bison to collect LALR statistics.

I'm surprised to see a comment in scan-gram.l that describes this behavior 
almost as if it's desired:

  /* Characters.  We don't check there is only one.  */

Is there some good reason why we shouldn't check?  If not, then I'd like 
to push the following patch (written against branch-2.5) to master and 
branch-2.5.  Any objections?

I've thought about making it an error instead of a warning because it 
seems like a severe mistake to miss.  However, I worry that could cause 
trouble for distro maintainers trying to build existing projects.

In the new test group, I wish I could check the case where a character 
literal ends at eof without a newline.  However, the documentation for 
AT_DATA demands that the file contents end with an "end of line", so I'm 
not sure what to do.  I might write the Autoconf people later if no one 
around here already knows how to work around this limitation.

>From 34689e4754fcac0154ba69622e241edeab1e6d7c Mon Sep 17 00:00:00 2001
From: Joel E. Denny <address@hidden>
Date: Thu, 23 Jul 2009 13:29:22 -0400
Subject: [PATCH] Warn about character literals not of length one.

* NEWS (2.5): Document.
* src/scan-gram.l (INITIAL): Remove comment that we don't check
the length.
(SC_ESCAPED_CHARACTER): Warn if length is wrong.
* tests/input.at (Bad character literals): New test group.

diff --git a/NEWS b/NEWS
index 7d401ad..e723542 100644
--- a/NEWS
+++ b/NEWS
@@ -98,6 +98,18 @@ Bison News
   about a missing semicolon where it did not before.  Future releases of
   Bison will cease to append semicolons entirely.
 
+** Character literals not of length one.
+
+  Previously, Bison quietly converted all character literals to length
+  one.  For example, without warning, Bison interpreted the operators in
+  the following grammar to be the same token:
+
+    exp: exp '++'
+       | exp '+' exp
+       ;
+
+  Bison now warns when a character literal is not of length one.
+
 * Changes in version 2.4.2 (????-??-??):
 
 ** %code is now a permanent feature.
diff --git a/src/scan-gram.l b/src/scan-gram.l
index 60813bb..7017dbb 100644
--- a/src/scan-gram.l
+++ b/src/scan-gram.l
@@ -250,7 +250,7 @@ splice       (\\[ \f\t\v]*\n)*
     complain_at (*loc, _("invalid identifier: %s"), quote (yytext));
   }
 
-  /* Characters.  We don't check there is only one.  */
+  /* Characters.  */
   "'"        STRING_GROW; token_start = loc->start; BEGIN SC_ESCAPED_CHARACTER;
 
   /* Strings. */
@@ -465,24 +465,38 @@ splice     (\\[ \f\t\v]*\n)*
 <SC_ESCAPED_CHARACTER>
 {
   "'"|"\n" {
-    if (yytext[0] == '\n')
-      unexpected_newline (token_start, "'");
     STRING_GROW;
     STRING_FINISH;
     loc->start = token_start;
     val->character = last_string[1];
+    {
+      size_t length = strlen (last_string);
+      if (strlen (last_string) < 3)
+        warn_at (*loc, _("empty character literal"));
+      else if (strlen (last_string) > 3)
+        warn_at (*loc, _("extra characters in character literal"));
+    }
+    if (yytext[0] == '\n')
+      unexpected_newline (token_start, "'");
     STRING_FREE;
     BEGIN INITIAL;
     return CHAR;
   }
   <<EOF>> {
-    unexpected_eof (token_start, "'");
     STRING_FINISH;
     loc->start = token_start;
-    if (strlen (last_string) > 1)
-      val->character = last_string[1];
-    else
-      val->character = last_string[0];
+    {
+      size_t length = strlen (last_string);
+      if (length < 2)
+        warn_at (*loc, _("empty character literal"));
+      else if (length > 2)
+        warn_at (*loc, _("extra characters in character literal"));
+      if (length > 1)
+        val->character = last_string[1];
+      else
+        val->character = last_string[0];
+    }
+    unexpected_eof (token_start, "'");
     STRING_FREE;
     BEGIN INITIAL;
     return CHAR;
diff --git a/tests/input.at b/tests/input.at
index f62a68d..8ba2496 100644
--- a/tests/input.at
+++ b/tests/input.at
@@ -1162,3 +1162,50 @@ AT_CHECK_NAMESPACE_ERROR([[::]],
                          [[namespace reference has a trailing "::"]])
 
 AT_CLEANUP
+
+## ------------------------ ##
+## Bad character literals.  ##
+## ------------------------ ##
+
+# Bison used to accept character literals that were empty or contained
+# too many characters.
+
+AT_SETUP([[Bad character literals]])
+
+AT_DATA([empty.y],
+[[%%
+start: '';
+start: '
+]])
+
+AT_BISON_CHECK([empty.y], [1], [],
+[[empty.y:2.8-9: warning: empty character literal
+empty.y:3.8-4.0: warning: empty character literal
+empty.y:3.8-4.0: missing `'' at end of line
+]])
+
+AT_DATA([two.y],
+[[%%
+start: 'ab';
+start: 'ab
+]])
+
+AT_BISON_CHECK([two.y], [1], [],
+[[two.y:2.8-11: warning: extra characters in character literal
+two.y:3.8-4.0: warning: extra characters in character literal
+two.y:3.8-4.0: missing `'' at end of line
+]])
+
+AT_DATA([three.y],
+[[%%
+start: 'abc';
+start: 'abc
+]])
+
+AT_BISON_CHECK([three.y], [1], [],
+[[three.y:2.8-12: warning: extra characters in character literal
+three.y:3.8-4.0: warning: extra characters in character literal
+three.y:3.8-4.0: missing `'' at end of line
+]])
+
+AT_CLEANUP
-- 
1.5.4.3





reply via email to

[Prev in Thread] Current Thread [Next in Thread]