[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] Re: FYI: The pleasures of the wonderful ;
From: |
Di-an JAN |
Subject: |
[PATCH] Re: FYI: The pleasures of the wonderful ; |
Date: |
Sat, 8 Nov 2008 11:51:33 -0800 (PST) |
On Fri, 7 Nov 2008, Joel E. Denny wrote:
On Fri, 7 Nov 2008, Akim Demaille wrote:
I have failures here on Java.
# -*- compilation -*-
197. java.at:385: testing ...
../../../tests/java.at:385: bison -o Calc.java Calc.y
../../../tests/java.at:385: test -n "$CONF_JAVA" || exit 77
test -n "$CONF_JAVAC" || exit 77
Not enabling shell tracing (command contains an embedded newline)
../../../tests/java.at:385: $SHELL ../../../javacomp.sh Calc.java
stderr:
Calc.java:391: unreachable statement
{ yyval = new Integer (0); return YYERROR; ;};
^
Calc.java:400: unreachable statement
{ yyval = new Integer (0); return YYERROR; ;};
^
2 errors
I pushed these 2 patches to branch-2.4.1 and will push to master after
I've tested there. However, it's fine by me if people want to get rid of
this feature altogether for 2.5.
I wrote a patch to solve this by adding the semicolon only if needed.
I updated the patch for the current tree but I won't be able to
finish retesting it today.
(Am I suppose to Cc: to bison-patches?)
The second patch implements the desired functionality as specified
in a FIXME comment, except that tha patch gives unconditional warnings
rather than conditional on -Wyacc (which is the documented name for
the --pedantic option in the FIXME, which currently does nothing;
I may work on it later). I think we should encourage people to fix
their code rather than rely on this Bison behavior. Also, my patch
handles C preprocessor directives: it's WRONG to add semicolons to the
end of those.
By the way, to support languages that doesn't use semicolons, just
make the added semicolon a M4 macro that defaults to `;' and other
skeletons can define it to empty.
While writing the testcase, I noticed that Bison doesn't allow unescaped
newlines in string literals in user code. I think this should be up to
the compiler that actually interpretes the code. That's the first patch.
Tested together on Cygwin with javac 1.6.0-07 and gij 3.4.4 with one
expected failure:
http://lists.gnu.org/archive/html/bug-bison/2008-07/msg00008.html
Di-an Jan
2008-11-08 Di-an Jan <address@hidden>
Allow unescaped newlines in character and string literals.
* src/scan-gram.l (flex rules section): Copy newlines even
SC_STRING and SC_CHARACTER. Don't generate error.
diff --git a/src/scan-gram.l b/src/scan-gram.l
index 697f52f..f951db1 100644
--- a/src/scan-gram.l
+++ b/src/scan-gram.l
@@ -456,14 +456,12 @@ splice (\\[ \f\t\v]*\n)*
<SC_CHARACTER>
{
"'" STRING_GROW; BEGIN context_state;
- \n unexpected_newline (token_start, "'"); BEGIN context_state;
<<EOF>> unexpected_eof (token_start, "'"); BEGIN context_state;
}
<SC_STRING>
{
"\"" STRING_GROW; BEGIN context_state;
- \n unexpected_newline (token_start, "\""); BEGIN context_state;
<<EOF>> unexpected_eof (token_start, "\""); BEGIN context_state;
}
@@ -586,7 +584,7 @@ splice (\\[ \f\t\v]*\n)*
`-----------------------------------------------------*/
<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER,SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>.
|
-<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>\n
STRING_GROW;
+<SC_COMMENT,SC_LINE_COMMENT,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE,SC_STRING,SC_CHARACTER>\n
STRING_GROW;
%%
2008-11-08 Di-an Jan <address@hidden>
Implement the FIXME that ends an user action with a semicolon
if it seems necessary.
* src/scan-code.l (flex rules section): Flag cpp directive from
any `#' to the first unescaped end-of-line. Semicolon is not
needed after `;', `{', '}', and is needed after any other token
(whitespaces, comments, and cpp directives have no effect).
* tests/actions.at (Fix user actions without a trailing semicolon):
New tests with and without --yacc.
* tests/input.at (AT_CHECK_UNUSED_VALUES): Add semicolons to
to make user actions complete statements.
Adjust column numbers in error messages.
* tests/regression.at (Fix user actions without a trailing semicolon):
Remove. Covered by new test.
diff --git a/src/scan-code.l b/src/scan-code.l
index 71c9076..b4b0fd7 100644
--- a/src/scan-code.l
+++ b/src/scan-code.l
@@ -81,6 +81,18 @@ splice (\\[ \f\t\v]*\n)*
/* Nesting level of the current code in braces. */
int braces_level = 0;
+ /* Whether a semicolon is probably needed.
+ The heuristic is that a semicolon is not needed after `{', `}' or `;'
+ and that whitespaces, comments, and C preprocessor directives
+ does not affect this flag.
+ Note that `{' does not need a semicolon because of `{}'. */
+ bool need_semicolon = false;
+
+ /* Whether in a C preprocessor directive. Don't use a start condition
+ for this because, at the end of strings and comments, we still need
+ to know whether we're in a directive. */
+ bool in_cpp = false;
+
/* This scanner is special: it is invoked only once, henceforth
is expected to return only once. This initialization is
therefore done once per action to translate. */
@@ -135,10 +147,12 @@ splice (\\[ \f\t\v]*\n)*
"'" {
STRING_GROW;
BEGIN SC_CHARACTER;
+ if (! in_cpp) need_semicolon = true;
}
"\"" {
STRING_GROW;
BEGIN SC_STRING;
+ if (! in_cpp) need_semicolon = true;
}
"/"{splice}"*" {
STRING_GROW;
@@ -154,43 +168,67 @@ splice (\\[ \f\t\v]*\n)*
{
"$"("<"{tag}">")?(-?[0-9]+|"$") {
handle_action_dollar (self->rule, yytext, *loc);
+ if (! in_cpp) need_semicolon = true;
}
"@"(-?[0-9]+|"$") {
handle_action_at (self->rule, yytext, *loc);
+ if (! in_cpp) need_semicolon = true;
}
"$" {
warn_at (*loc, _("stray `$'"));
obstack_sgrow (&obstack_for_string, "$][");
+ if (! in_cpp) need_semicolon = true;
}
"@" {
warn_at (*loc, _("stray `@'"));
obstack_sgrow (&obstack_for_string, "@@");
+ if (! in_cpp) need_semicolon = true;
+ }
+ "[" {
+ obstack_sgrow (&obstack_for_string, "@{");
+ if (! in_cpp) need_semicolon = true;
+ }
+ "]" {
+ obstack_sgrow (&obstack_for_string, "@}");
+ if (! in_cpp) need_semicolon = true;
}
- "{" STRING_GROW; ++braces_level;
+ ";" STRING_GROW; if (! in_cpp) need_semicolon = false;
+ "{" STRING_GROW; ++braces_level; if (! in_cpp) need_semicolon = false;
"}" {
bool outer_brace = --braces_level == 0;
/* As an undocumented Bison extension, append `;' before the last
brace in braced code, so that the user code can omit trailing
`;'. But do not append `;' if emulating Yacc, since Yacc does
- not append one.
-
- FIXME: Bison should warn if a semicolon seems to be necessary
- here, and should omit the semicolon if it seems unnecessary
- (e.g., after ';', '{', or '}', each followed by comments or
- white space). Such a warning shouldn't depend on --yacc; it
- should depend on a new --pedantic option, which would cause
- Bison to warn if it detects an extension to POSIX. --pedantic
- should also diagnose other Bison extensions like %yacc.
- Perhaps there should also be a GCC-style --pedantic-errors
- option, so that such warnings are diagnosed as errors. */
- if (outer_brace && ! yacc_flag)
- obstack_1grow (&obstack_for_string, ';');
+ not append one. */
+ if (outer_brace && need_semicolon)
+ {
+ warn_at (*loc, _("a `;' might be needed at the end of action code"));
+ if (! yacc_flag)
+ {
+ if (in_cpp)
+ obstack_sgrow (&obstack_for_string, "\n;");
+ else
+ obstack_1grow (&obstack_for_string, ';');
+ }
+ }
STRING_GROW;
+ if (! in_cpp) need_semicolon = false;
}
+
+ /* Preprocessing directives should only be recognized at the beginning
+ of lines, allowing whitespace including comments, but in C/C++,
+ `#' can only be the start of preprocessor directives or within
+ `#define' directives anyway, so don't bother with begin of line. */
+ "#" STRING_GROW; in_cpp = true;
+
+ {splice} STRING_GROW;
+ [\n\r] STRING_GROW; in_cpp = false;
+ [ \t\f] STRING_GROW;
+ . STRING_GROW; if (! in_cpp) need_semicolon = true;
}
<SC_SYMBOL_ACTION>
diff --git a/tests/actions.at b/tests/actions.at
index 602eac8..8ae9a4f 100644
--- a/tests/actions.at
+++ b/tests/actions.at
@@ -1304,3 +1304,125 @@ AT_CLEANUP])
AT_CHECK_ACTION_LOCATIONS([[%initial-action]])
AT_CHECK_ACTION_LOCATIONS([[%destructor]])
AT_CHECK_ACTION_LOCATIONS([[%printer]])
+
+
+## ----------------------------------------------- ##
+## Fix user actions without a trailing semicolon. ##
+## ----------------------------------------------- ##
+
+AT_SETUP([[Fix user actions without a trailing semicolon]])
+
+# This feature is undocumented, but we accidentally broke it in 2.3a,
+# and there was a complaint at:
+# <http://lists.gnu.org/archive/html/bug-bison/2008-11/msg00001.html>.
+
+AT_DATA([input.y],
+[[%%
+start: test2 test1 test0 testc;
+
+test2
+: 'a' { semi; /* TEST:N:2 */ }
+| 'b' { if (0) {no_semi} /* TEST:N:2 */ }
+| 'c' { if (0) {semi;} /* TEST:N:2 */ }
+| 'd' { semi; no_semi /* TEST:Y:2 */ }
+| 'e' { semi(); no_semi() /* TEST:Y:2 */ }
+| 'f' { semi[]; no_semi[] /* TEST:Y:2 */ }
+| 'g' { semi++; no_semi++ /* TEST:Y:2 */ }
+| 'h' { {no_semi} no_semi /* TEST:Y:2 */ }
+| 'i' { {semi;} no_semi /* TEST:Y:2 */ }
+;
+test1
+ : 'a' { semi; // TEST:N:1 ;
+} | 'b' { if (0) {no_semi} // TEST:N:1 ;
+} | 'c' { if (0) {semi;} // TEST:N:1 ;
+} | 'd' { semi; no_semi // TEST:Y:1 ;
+} | 'e' { semi(); no_semi() // TEST:Y:1 ;
+} | 'f' { semi[]; no_semi[] // TEST:Y:1 ;
+} | 'g' { semi++; no_semi++ // TEST:Y:1 ;
+} | 'h' { {no_semi} no_semi // TEST:Y:1 ;
+} | 'i' { {semi;} no_semi // TEST:Y:1 ;
+} ;
+test0
+ : 'a' { semi; // TEST:N:1 {}
+} | 'b' { if (0) {no_semi} // TEST:N:1 {}
+} | 'c' { if (0) {semi;} // TEST:N:1 {}
+} | 'd' { semi; no_semi // TEST:Y:1 {}
+} | 'e' { semi(); no_semi() // TEST:Y:1 {}
+} | 'f' { semi[]; no_semi[] // TEST:Y:1 {}
+} | 'g' { semi++; no_semi++ // TEST:Y:1 {}
+} | 'h' { {no_semi} no_semi // TEST:Y:1 {}
+} | 'i' { {semi;} no_semi // TEST:Y:1 {}
+} ;
+
+testc
+: 'a' {
+#define TEST_MACRO_N \
+[]"broken\" $ @ $$ @$ [];
+string;"}
+| 'b' {
+no_semi
+#define TEST_MACRO_Y \
+[]"broken\" $ @ $$ @$ [];
+string;"}
+]])
+
+m4_define([AT_CHECK_ACTION_MISSING_SEMICOLON_WARNINGS],
+[[input.y:8.48: warning: a `;' might be needed at the end of action code
+input.y:9.48: warning: a `;' might be needed at the end of action code
+input.y:10.48: warning: a `;' might be needed at the end of action code
+input.y:11.48: warning: a `;' might be needed at the end of action code
+input.y:12.48: warning: a `;' might be needed at the end of action code
+input.y:13.48: warning: a `;' might be needed at the end of action code
+input.y:20.1: warning: a `;' might be needed at the end of action code
+input.y:21.1: warning: a `;' might be needed at the end of action code
+input.y:22.1: warning: a `;' might be needed at the end of action code
+input.y:23.1: warning: a `;' might be needed at the end of action code
+input.y:24.1: warning: a `;' might be needed at the end of action code
+input.y:25.1: warning: a `;' might be needed at the end of action code
+input.y:31.1: warning: a `;' might be needed at the end of action code
+input.y:32.1: warning: a `;' might be needed at the end of action code
+input.y:33.1: warning: a `;' might be needed at the end of action code
+input.y:34.1: warning: a `;' might be needed at the end of action code
+input.y:35.1: warning: a `;' might be needed at the end of action code
+input.y:36.1: warning: a `;' might be needed at the end of action code
+input.y:47.9: warning: a `;' might be needed at the end of action code
+]])
+
+AT_BISON_CHECK([[-o input.c input.y]], [0], [],
+ AT_CHECK_ACTION_MISSING_SEMICOLON_WARNINGS)
+AT_CHECK([[grep -c '/\* TEST:N:2 \*/ }$' input.c]], [0], [[3
+]])
+AT_CHECK([[grep -c '/\* TEST:Y:2 \*/ ;}$' input.c]], [0], [[6
+]])
+AT_CHECK([[sed -n '/TEST:N:1/{N
+s/\n/<NL>/gp}' input.c | grep -c '// TEST:N:1 [;{}]*<NL>}$']], [0], [[6
+]])
+AT_CHECK([[sed -n '/TEST:Y:1/{N
+s/\n/<NL>/gp}' input.c | grep -c '// TEST:Y:1 [;{}]*<NL>;}$']], [0], [[12
+]])
+AT_CHECK([[sed -n '/^#define TEST_MACRO_N/{N
+N
+s/\n/<NL>/gp}' input.c | grep -F -xc '#define TEST_MACRO_N \<NL>[]"broken\" $ @ $$ @$
[];<NL>string;"}']],
+ [0], [[1
+]])
+AT_CHECK([[sed -n '/^#define TEST_MACRO_Y/{N
+N
+N
+s/\n/<NL>/gp}' input.c | grep -F -xc '#define TEST_MACRO_Y \<NL>[]"broken\" $ @ $$ @$
[];<NL>string;"<NL>;}']],
+ [0], [[1
+]])
+
+AT_BISON_CHECK([[--yacc -o input.c input.y]], [0], [],
+ AT_CHECK_ACTION_MISSING_SEMICOLON_WARNINGS)
+AT_CHECK([[grep -c '/\* TEST:[NY]:2 \*/ }$' input.c]], [[0]], [[9
+]])
+AT_CHECK([[sed -n '/TEST:[NY]:1/{N
+s/\n/<NL>/gp}' input.c | grep -c '// TEST:[NY]:1 [;{}]*<NL>}$']], [[0]], [[18
+]])
+AT_CHECK([[sed -n '/^#define TEST_MACRO_[NY]/{N
+N
+s/\n/<NL>/gp}' input.c | grep -xc '#define TEST_MACRO_[NY] \\<NL>\[\]"broken\\" \$ @ \$\$
@\$ \[\];<NL>string;"}']],
+ [0], [[2
+]])
+
+AT_CLEANUP
diff --git a/tests/input.at b/tests/input.at
index 8bf61fa..7da3581 100644
--- a/tests/input.at
+++ b/tests/input.at
@@ -97,16 +97,16 @@ m4_ifval($1, [
], [_AT_UNUSED_VALUES_DECLARATIONS
])[[%%
start:
- 'a' a { $]2[ } | 'b' b { $]2[ } | 'c' c { $]2[ } | 'd' d { $]2[ } | 'e' e {
$]2[ }
-| 'f' f { $]2[ } | 'g' g { $]2[ } | 'h' h { $]2[ } | 'i' i { $]2[ } | 'j' j {
$]2[ }
-| 'k' k { $]2[ } | 'l' l { $]2[ }
+ 'a' a { $]2[; } | 'b' b { $]2[; } | 'c' c { $]2[; } | 'd' d { $]2[; }
+| 'e' e { $]2[; } | 'f' f { $]2[; } | 'g' g { $]2[; } | 'h' h { $]2[; }
+| 'i' i { $]2[; } | 'j' j { $]2[; } | 'k' k { $]2[; } | 'l' l { $]2[; }
;
a: INT | INT { } INT { } INT { };
b: INT | /* empty */;
-c: INT | INT { $]1[ } INT { $<integer>2 } INT { $<integer>4 };
-d: INT | INT { } INT { $]1[ } INT { $<integer>2 };
-e: INT | INT { } INT { } INT { $]1[ };
+c: INT | INT { $]1[; } INT { $<integer>2; } INT { $<integer>4; };
+d: INT | INT { } INT { $]1[; } INT { $<integer>2; };
+e: INT | INT { } INT { } INT { $]1[; };
f: INT | INT { } INT { } INT { $]$[ = $]1[ + $]3[ + $]5[; };
g: INT | INT { $<integer>$; } INT { $<integer>$; } INT { };
h: INT | INT { $<integer>$; } INT { $<integer>$ = $<integer>2; } INT { };
@@ -123,18 +123,18 @@ input.y:11.10-32: warning: unused value: $]1[
input.y:11.10-32: warning: unused value: $]3[
input.y:11.10-32: warning: unused value: $]5[
input.y:12.9: warning: empty rule for typed nonterminal, and no action
-]]m4_ifval($2, [[[input.y:13.14-19: warning: unset value: $$
-input.y:13.25-39: warning: unset value: $$
-]]])[[input.y:13.10-59: warning: unset value: $]$[
-input.y:13.10-59: warning: unused value: $]3[
-input.y:13.10-59: warning: unused value: $]5[
+]]m4_ifval($2, [[[input.y:13.14-20: warning: unset value: $$
+input.y:13.26-41: warning: unset value: $$
+]]])[[input.y:13.10-62: warning: unset value: $]$[
+input.y:13.10-62: warning: unused value: $]3[
+input.y:13.10-62: warning: unused value: $]5[
]]m4_ifval($2, [[[input.y:14.14-16: warning: unset value: $$
-]]])[[input.y:14.10-47: warning: unset value: $]$[
-input.y:14.10-47: warning: unused value: $]3[
-input.y:14.10-47: warning: unused value: $]5[
-input.y:15.10-36: warning: unset value: $]$[
-input.y:15.10-36: warning: unused value: $]3[
-input.y:15.10-36: warning: unused value: $]5[
+]]])[[input.y:14.10-49: warning: unset value: $]$[
+input.y:14.10-49: warning: unused value: $]3[
+input.y:14.10-49: warning: unused value: $]5[
+input.y:15.10-37: warning: unset value: $]$[
+input.y:15.10-37: warning: unused value: $]3[
+input.y:15.10-37: warning: unused value: $]5[
input.y:17.10-58: warning: unset value: $]$[
input.y:17.10-58: warning: unused value: $]1[
]]m4_ifval($2, [[[input.y:17.10-58: warning: unused value: $]2[
diff --git a/tests/regression.at b/tests/regression.at
index 6bfc8d0..9a27749 100644
--- a/tests/regression.at
+++ b/tests/regression.at
@@ -1255,27 +1255,3 @@ AT_COMPILE([[input]])
AT_PARSER_CHECK([[./input]])
AT_CLEANUP
-
-
-
-## ----------------------------------------------- ##
-## Fix user actions without a trailing semicolon. ##
-## ----------------------------------------------- ##
-
-AT_SETUP([[Fix user actions without a trailing semicolon]])
-
-# This feature is undocumented, but we accidentally broke it in 2.3a, and there
-# was a complaint at:
-# <http://lists.gnu.org/archive/html/bug-bison/2008-11/msg00001.html>.
-
-AT_DATA([input.y],
-[[%%
-start: {asdffdsa} ;
-]])
-
-AT_BISON_CHECK([[-o input.c input.y]])
-AT_CHECK([[sed -n '/asdffdsa/s/^ *//p' input.c]], [[0]],
-[[{asdffdsa;}
-]])
-
-AT_CLEANUP