[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bison scanner patch to fix POSIX incompatibilities, etc.
From: |
Paul Eggert |
Subject: |
Bison scanner patch to fix POSIX incompatibilities, etc. |
Date: |
Sun, 3 Nov 2002 00:49:18 -0800 (PST) |
I installed the following (unfortunately lengthy) patch to fix several
minor bugs with the Bison scanner. For example, the scanner
mishandled backslash-newline in C actions, and it miscounted columns
and lines in several circumstances. While I was at it, I documented
the scanner a bit better (e.g., I documented that it doesn't do
trigraphs).
2002-11-03 Paul Eggert <address@hidden>
* src/scan-gram.l: Revamp to fix POSIX incompatibilities,
to count columns correctly, and to check for invalid inputs.
Use mbsnwidth to count columns correctly. Account for tabs, too.
Include mbswidth.h.
(YY_USER_ACTION): Invoke extend_location rather than LOCATION_COLUMNS.
(extend_location): New function.
(YY_LINES): Remove.
Handle CRLF in C code rather than in Lex code.
(YY_INPUT): New macro.
(no_cr_read): New function.
Scan UCNs, even though we don't fully handle them yet.
(convert_ucn_to_byte): New function.
Handle backslash-newline correctly in C code.
(SC_LINE_COMMENT, SC_YACC_COMMENT): New states.
(eols, blanks): Remove. YY_USER_ACTION now counts newlines etc.;
all uses changed.
(tag, splice): New EREs. Do not allow NUL or newline in tags.
Use {splice} wherever C allows backslash-newline.
YY_STEP after space, newline, vertical-tab.
("/*"): BEGIN SC_YACC_COMMENT, not yy_push_state (SC_COMMENT).
(letter, id): Don't assume ASCII; e.g., spell out a-z.
({int}, handle_action_dollar, handle_action_at): Check for integer
overflow.
(YY_STEP): Omit trailing semicolon, so that it's more like C.
(<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>): Allow \0 and \00
as well as \000. Check for UCHAR_MAX, not 255.
Allow \x with an arbitrary positive number of digits, as in C.
Check for overflow here.
Allow \? and UCNs, for compatibility with C.
(handle_symbol_code_dollar): Use quote_n slot 1 to avoid collision
with quote slot used by complain_at.
* tests/input.at: Add tests for backslash-newline, m4 quotes
in symbols, long literals, and funny escapes in strings.
* configure.ac (jm_PREREQ_MBSWIDTH): Add.
* lib/Makefile.am (libbison_a_SOURCES): Add mbswidth.h, mbswidth.c.
* lib/mbswidth.h, lib/mbswidth.c: New files, from GNU gettext.
* m4/Makefile.am (EXTRA_DIST): Add mbswidth.m4.
* m4/mbswidth.m4: New file, from GNU coreutils.
* doc/bison.texinfo (Grammar Outline): Document // comments.
(Symbols): Document that trigraphs have no special meaning in Bison,
nor is backslash-newline allowed.
(Actions): Document that trigraphs have no special meaning.
* src/location.h (LOCATION_COLUMNS, LOCATION_LINES): Remove;
no longer used.
Index: configure.ac
===================================================================
RCS file: /cvsroot/bison/bison/configure.ac,v
retrieving revision 1.19
diff -p -u -r1.19 configure.ac
--- configure.ac 25 Oct 2002 06:56:26 -0000 1.19
+++ configure.ac 3 Nov 2002 08:33:21 -0000
@@ -89,6 +89,7 @@ AC_REPLACE_FUNCS(memchr memrchr \
strchr stpcpy strrchr strspn strtol)
AC_FUNC_MALLOC
AC_FUNC_REALLOC
+jm_PREREQ_MBSWIDTH
jm_PREREQ_QUOTEARG
jm_FUNC_ARGMATCH
jm_PREREQ_ERROR
Index: doc/bison.texinfo
===================================================================
RCS file: /cvsroot/bison/bison/doc/bison.texinfo,v
retrieving revision 1.73
diff -p -u -r1.73 bison.texinfo
--- doc/bison.texinfo 23 Oct 2002 05:26:32 -0000 1.73
+++ doc/bison.texinfo 3 Nov 2002 08:33:23 -0000
@@ -2212,6 +2212,8 @@ appropriate delimiters:
@end example
Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
+As a @acronym{GNU} extension, @samp{//} introduces a comment that
+continues until end of line.
@menu
* Prologue:: Syntax and usage of the prologue.
@@ -2360,7 +2362,9 @@ All the usual escape sequences used in c
used in Bison as well, but you must not use the null character as a
character literal because its numeric code, zero, signifies
end-of-input (@pxref{Calling Convention, ,Calling Convention
-for @code{yylex}}).
+for @code{yylex}}). Also, unlike standard C, trigraphs have no
+special meaning in Bison character literals, nor is backslash-newline
+allowed.
@item
@cindex string token
@@ -2387,9 +2391,10 @@ does not enforce this convention, but if
read your program will be confused.
All the escape sequences used in string literals in C can be used in
-Bison as well. A literal string token must contain two or more
-characters; for a token containing just one character, use a character
-token (see above).
+Bison as well. However, unlike Standard C, trigraphs have no special
+meaning in Bison string literals, nor is backslash-newline allowed. A
+literal string token must contain two or more characters; for a token
+containing just one character, use a character token (see above).
@end itemize
How you choose to write a terminal symbol has no effect on its
@@ -2691,7 +2696,13 @@ is to compute a semantic value for the g
semantic values associated with tokens or smaller groupings.
An action consists of C statements surrounded by braces, much like a
-compound statement in address@hidden It can be placed at any position in the
rule;
+compound statement in address@hidden An action can contain any sequence of C
+statements. Bison does not look for trigraphs, though, so if your C
+code uses trigraphs you should ensure that they do not affect the
+nesting of braces or the boundaries of comments, strings, or character
+literals.
+
+An action can be placed at any position in the rule;
it is executed at that position. Most rules have just one action at the
end of the rule, following all the components. Actions in the middle of
a rule are tricky and used only for special purposes (@pxref{Mid-Rule
Index: lib/Makefile.am
===================================================================
RCS file: /cvsroot/bison/bison/lib/Makefile.am,v
retrieving revision 1.33
diff -p -u -r1.33 Makefile.am
--- lib/Makefile.am 20 Oct 2002 06:29:41 -0000 1.33
+++ lib/Makefile.am 3 Nov 2002 08:33:23 -0000
@@ -34,6 +34,7 @@ libbison_a_SOURCES = \
basename.c dirname.h dirname.c \
getopt.h getopt.c getopt1.c \
hash.h hash.c \
+ mbswidth.h mbswidth.c \
quote.h quote.c quotearg.h quotearg.c \
subpipe.h subpipe.c unlocked-io.h \
xalloc.h xmalloc.c xstrdup.c xstrndup.c \
Index: m4/Makefile.am
===================================================================
RCS file: /cvsroot/bison/bison/m4/Makefile.am,v
retrieving revision 1.22
diff -p -u -r1.22 Makefile.am
--- m4/Makefile.am 22 Oct 2002 04:38:11 -0000 1.22
+++ m4/Makefile.am 3 Nov 2002 08:33:23 -0000
@@ -1,6 +1,6 @@
## Process this file with automake to produce Makefile.in -*-Makefile-*-
EXTRA_DIST = \
dmalloc.m4 error.m4 \
- m4.m4 mbrtowc.m4 memcmp.m4 \
+ m4.m4 mbrtowc.m4 mbswidth.m4 memcmp.m4 \
prereq.m4 stdbool.m4 subpipe.m4 timevar.m4 warning.m4 \
gettext.m4 iconv.m4 lib-ld.m4 lib-link.m4 lib-prefix.m4 progtest.m4
Index: src/location.h
===================================================================
RCS file: /cvsroot/bison/bison/src/location.h,v
retrieving revision 1.3
diff -p -u -r1.3 location.h
--- src/location.h 9 Jul 2002 16:24:57 -0000 1.3
+++ src/location.h 3 Nov 2002 08:33:23 -0000
@@ -40,20 +40,6 @@ do { \
(Loc).last_column = (Loc).last_line = 1; \
} while (0)
-/* Advance of NUM columns. */
-# define LOCATION_COLUMNS(Loc, Num) \
-do { \
- (Loc).last_column += Num; \
-} while (0)
-
-
-/* Advance of NUM lines. */
-# define LOCATION_LINES(Loc, Num) \
-do { \
- (Loc).last_column = 1; \
- (Loc).last_line += Num; \
-} while (0)
-
/* Restart: move the first cursor to the last position. */
# define LOCATION_STEP(Loc) \
Index: src/scan-gram.l
===================================================================
RCS file: /cvsroot/bison/bison/src/scan-gram.l,v
retrieving revision 1.29
diff -p -u -r1.29 scan-gram.l
--- src/scan-gram.l 21 Oct 2002 05:30:50 -0000 1.29
+++ src/scan-gram.l 3 Nov 2002 08:33:24 -0000
@@ -24,6 +24,7 @@
%{
#include "system.h"
+#include "mbswidth.h"
#include "complain.h"
#include "quote.h"
#include "getargs.h"
@@ -39,9 +40,95 @@ do { \
if (yycontrol) {;}; \
} while (0)
-#define YY_USER_ACTION LOCATION_COLUMNS (*yylloc, yyleng);
-#define YY_LINES LOCATION_LINES (*yylloc, yyleng);
-#define YY_STEP LOCATION_STEP (*yylloc);
+#define YY_USER_ACTION extend_location (yylloc, yytext, yyleng);
+#define YY_STEP LOCATION_STEP (*yylloc)
+
+#define YY_INPUT(buf, result, size) ((result) = no_cr_read (yyin, buf, size))
+
+
+/* Read bytes from FP into buffer BUF of size SIZE. Return the
+ number of bytes read. Remove '\r' from input, treating \r\n
+ and isolated \r as \n. */
+
+static size_t
+no_cr_read (FILE *fp, char *buf, size_t size)
+{
+ size_t s = fread (buf, 1, size, fp);
+ if (s)
+ {
+ char *w = memchr (buf, '\r', s);
+ if (w)
+ {
+ char const *r = ++w;
+ char const *lim = buf + s;
+
+ for (;;)
+ {
+ /* Found an '\r'. Treat it like '\n', but ignore any
+ '\n' that immediately follows. */
+ w[-1] = '\n';
+ if (r == lim)
+ {
+ int ch = getc (fp);
+ if (ch != '\n' && ungetc (ch, fp) != ch)
+ break;
+ }
+ else if (*r == '\n')
+ r++;
+
+ /* Copy until the next '\r'. */
+ do
+ {
+ if (r == lim)
+ return w - buf;
+ }
+ while ((*w++ = *r++) != '\r');
+ }
+
+ return w - buf;
+ }
+ }
+
+ return s;
+}
+
+
+/* Extend *LOC to account for token TOKEN of size SIZE. */
+
+static void
+extend_location (location_t *loc, char const *token, int size)
+{
+ int line = loc->last_line;
+ int column = loc->last_column;
+ char const *p0 = token;
+ char const *p = token;
+ char const *lim = token + size;
+
+ for (p = token; p < lim; p++)
+ switch (*p)
+ {
+ case '\r':
+ /* \r shouldn't survive no_cr_read. */
+ abort ();
+
+ case '\n':
+ line++;
+ column = 1;
+ p0 = p + 1;
+ break;
+
+ case '\t':
+ column += mbsnwidth (p0, p - p0, 0);
+ column += 8 - ((column - 1) & 7);
+ p0 = p + 1;
+ break;
+ }
+
+ loc->last_line = line;
+ loc->last_column = column + mbsnwidth (p0, p - p0, 0);
+}
+
+
/* STRING_OBSTACK -- Used to store all the characters that we need to
keep (to construct ID, STRINGS etc.). Use the following macros to
@@ -91,17 +178,26 @@ static void handle_dollar (braced_code_t
char *cp, location_t location);
static void handle_at (braced_code_t code_kind,
char *cp, location_t location);
+static int convert_ucn_to_byte (char const *hex_text);
%}
-%x SC_COMMENT
+%x SC_COMMENT SC_LINE_COMMENT SC_YACC_COMMENT
%x SC_STRING SC_CHARACTER
%x SC_ESCAPED_STRING SC_ESCAPED_CHARACTER
%x SC_BRACED_CODE SC_PROLOGUE SC_EPILOGUE
-id [.a-zA-Z_][.a-zA-Z_0-9]*
+letter [.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_]
+id {letter}({letter}|[0-9])*
int [0-9]+
-eols (\n|\r|\n\r|\r\n)+
-blanks [ \t\f]+
+
+/* POSIX says that a tag must be both an id and a C union member, but
+ historically almost any character is allowed in a tag. We disallow
+ NUL and newline, as this simplifies our implementation. */
+tag [^\0\n>]+
+
+/* Zero or more instances of backslash-newline. Following GCC, allow
+ white space between the backslash and the newline. */
+splice (\\[ \f\t\v]*\n)*
%%
%{
@@ -136,7 +232,7 @@ blanks [ \t\f]+
"%nterm" return PERCENT_NTERM;
"%output" return PERCENT_OUTPUT;
"%parse-param" return PERCENT_PARSE_PARAM;
- "%prec" { rule_length--; return PERCENT_PREC; }
+ "%prec" rule_length--; return PERCENT_PREC;
"%printer" return PERCENT_PRINTER;
"%pure"[-_]"parser" return PERCENT_PURE_PARSER;
"%right" return PERCENT_RIGHT;
@@ -152,20 +248,31 @@ blanks [ \t\f]+
"%yacc" return PERCENT_YACC;
"=" return EQUAL;
- ":" { rule_length = 0; return COLON; }
- "|" { rule_length = 0; return PIPE; }
+ ":" rule_length = 0; return COLON;
+ "|" rule_length = 0; return PIPE;
"," return COMMA;
";" return SEMICOLON;
- {eols} YY_LINES; YY_STEP;
- {blanks} YY_STEP;
+ [ \f\n\t\v]+ YY_STEP;
+
{id} {
yylval->symbol = symbol_get (yytext, *yylloc);
rule_length++;
return ID;
}
- {int} yylval->integer = strtol (yytext, 0, 10); return INT;
+ {int} {
+ unsigned long num;
+ errno = 0;
+ num = strtoul (yytext, 0, 10);
+ if (INT_MAX < num || errno)
+ {
+ complain_at (*yylloc, _("%s is invalid"), yytext);
+ num = INT_MAX;
+ }
+ yylval->integer = num;
+ return INT;
+ }
/* Characters. We don't check there is only one. */
"'" YY_OBS_GROW; yy_push_state (SC_ESCAPED_CHARACTER);
@@ -174,7 +281,7 @@ blanks [ \t\f]+
"\"" YY_OBS_GROW; yy_push_state (SC_ESCAPED_STRING);
/* Comments. */
- "/*" yy_push_state (SC_COMMENT);
+ "/*" BEGIN SC_YACC_COMMENT;
"//".* YY_STEP;
/* Prologue. */
@@ -184,7 +291,7 @@ blanks [ \t\f]+
"{" YY_OBS_GROW; ++braces_level; yy_push_state (SC_BRACED_CODE);
/* A type. */
- "<"[^>]+">" {
+ "<"{tag}">" {
obstack_grow (&string_obstack, yytext + 1, yyleng - 2);
YY_OBS_FINISH;
yylval->string = last_string;
@@ -206,41 +313,48 @@ blanks [ \t\f]+
}
- /*------------------------------------------------------------.
- | Whatever the start condition (but those which correspond to |
- | entity `swallowed' by Bison: SC_ESCAPED_STRING and |
- | SC_ESCAPED_CHARACTER), no M4 character must escape as is. |
- `------------------------------------------------------------*/
+ /*-------------------------------------------------------------------.
+ | Whatever the start condition (but those which correspond to |
+ | entities `swallowed' by Bison: SC_YACC_COMMENT, SC_ESCAPED_STRING, |
+ | and SC_ESCAPED_CHARACTER), no M4 character must escape as is. |
+ `-------------------------------------------------------------------*/
-<SC_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
+<SC_COMMENT,SC_LINE_COMMENT,SC_STRING,SC_CHARACTER,SC_BRACED_CODE,SC_PROLOGUE,SC_EPILOGUE>
{
- \[ if (YY_START != SC_COMMENT) obstack_sgrow (&string_obstack,
"@<:@");
- \] if (YY_START != SC_COMMENT) obstack_sgrow (&string_obstack,
"@:>@");
+ \[ obstack_sgrow (&string_obstack, "@<:@");
+ \] obstack_sgrow (&string_obstack, "@:>@");
}
+ /*---------------------------------------------------------------.
+ | Scanning a Yacc comment. The initial `/ *' is already eaten. |
+ `---------------------------------------------------------------*/
- /*-----------------------------------------------------------.
- | Scanning a C comment. The initial `/ *' is already eaten. |
- `-----------------------------------------------------------*/
-
-<SC_COMMENT>
+<SC_YACC_COMMENT>
{
- "*/" { /* End of the comment. */
- if (yy_top_state () == INITIAL)
- {
- YY_STEP;
- }
- else
- {
- YY_OBS_GROW;
- }
- yy_pop_state ();
+ "*/" {
+ YY_STEP;
+ BEGIN INITIAL;
}
- [^\[\]*\n\r]+ if (yy_top_state () != INITIAL) YY_OBS_GROW;
- {eols} if (yy_top_state () != INITIAL) YY_OBS_GROW; YY_LINES;
- . /* Stray `*'. */if (yy_top_state () != INITIAL) YY_OBS_GROW;
+ [^*]+|"*" ;
+
+ <<EOF>> {
+ LOCATION_PRINT (stderr, *yylloc);
+ fprintf (stderr, _(": unexpected end of file in a comment\n"));
+ BEGIN INITIAL;
+ }
+}
+
+
+ /*------------------------------------------------------------.
+ | Scanning a C comment. The initial `/ *' is already eaten. |
+ `------------------------------------------------------------*/
+
+<SC_COMMENT>
+{
+ "*"{splice}"/" YY_OBS_GROW; yy_pop_state ();
+ [^*\[\]]+|"*" YY_OBS_GROW;
<<EOF>> {
LOCATION_PRINT (stderr, *yylloc);
@@ -250,6 +364,18 @@ blanks [ \t\f]+
}
+ /*--------------------------------------------------------------.
+ | Scanning a line comment. The initial `//' is already eaten. |
+ `--------------------------------------------------------------*/
+
+<SC_LINE_COMMENT>
+{
+ "\n" YY_OBS_GROW; yy_pop_state ();
+ ([^\n\[\]]|{splice})+ YY_OBS_GROW;
+ <<EOF>> yy_pop_state ();
+}
+
+
/*----------------------------------------------------------------.
| Scanning a C string, including its escapes. The initial `"' is |
| already eaten. |
@@ -267,9 +393,7 @@ blanks [ \t\f]+
return STRING;
}
- [^\"\n\r\\]+ YY_OBS_GROW;
-
- {eols} obstack_1grow (&string_obstack, '\n'); YY_LINES;
+ [^\"\\]+ YY_OBS_GROW;
<<EOF>> {
LOCATION_PRINT (stderr, *yylloc);
@@ -305,9 +429,7 @@ blanks [ \t\f]+
}
}
- [^\n\r\\] YY_OBS_GROW;
-
- {eols} obstack_1grow (&string_obstack, '\n'); YY_LINES;
+ [^'\\]+ YY_OBS_GROW;
<<EOF>> {
LOCATION_PRINT (stderr, *yylloc);
@@ -327,9 +449,9 @@ blanks [ \t\f]+
<SC_ESCAPED_STRING,SC_ESCAPED_CHARACTER>
{
- \\[0-7]{3} {
- long c = strtol (yytext + 1, 0, 8);
- if (c > 255)
+ \\[0-7]{1,3} {
+ unsigned long c = strtoul (yytext + 1, 0, 8);
+ if (UCHAR_MAX < c)
{
LOCATION_PRINT (stderr, *yylloc);
fprintf (stderr, _(": invalid escape: %s\n"), quote (yytext));
@@ -339,8 +461,18 @@ blanks [ \t\f]+
obstack_1grow (&string_obstack, c);
}
- \\x[0-9a-fA-F]{2} {
- obstack_1grow (&string_obstack, strtol (yytext + 2, 0, 16));
+ \\x[0-9a-fA-F]+ {
+ unsigned long c;
+ errno = 0;
+ c = strtoul (yytext + 2, 0, 16);
+ if (UCHAR_MAX < c || errno)
+ {
+ LOCATION_PRINT (stderr, *yylloc);
+ fprintf (stderr, _(": invalid escape: %s\n"), quote (yytext));
+ YY_STEP;
+ }
+ else
+ obstack_1grow (&string_obstack, c);
}
\\a obstack_1grow (&string_obstack, '\a');
@@ -350,7 +482,18 @@ blanks [ \t\f]+
\\r obstack_1grow (&string_obstack, '\r');
\\t obstack_1grow (&string_obstack, '\t');
\\v obstack_1grow (&string_obstack, '\v');
- \\[\\""''] obstack_1grow (&string_obstack, yytext[1]);
+ \\[\"'?\\] obstack_1grow (&string_obstack, yytext[1]);
+ \\(u|U[0-9a-fA-F]{4})[0-9a-fA-F]{4} {
+ int c = convert_ucn_to_byte (yytext);
+ if (c < 0)
+ {
+ LOCATION_PRINT (stderr, *yylloc);
+ fprintf (stderr, _(": invalid escape: %s\n"), quote (yytext));
+ YY_STEP;
+ }
+ else
+ obstack_1grow (&string_obstack, c);
+ }
\\(.|\n) {
LOCATION_PRINT (stderr, *yylloc);
fprintf (stderr, _(": unrecognized escape: %s\n"), quote (yytext));
@@ -374,13 +517,12 @@ blanks [ \t\f]+
yy_pop_state ();
}
- [^\[\]\'\n\r\\]+ YY_OBS_GROW;
- \\(.|\n) YY_OBS_GROW;
- /* FLex wants this rule, in case of a `\<<EOF>>'. */
+ [^'\[\]\\]+ YY_OBS_GROW;
+ \\{splice}[^\[\]] YY_OBS_GROW;
+ {splice} YY_OBS_GROW;
+ /* Needed for `\<<EOF>>', `\\<<newline>>[', and `\\<<newline>>]'. */
\\ YY_OBS_GROW;
- {eols} YY_OBS_GROW; YY_LINES;
-
<<EOF>> {
LOCATION_PRINT (stderr, *yylloc);
fprintf (stderr, _(": unexpected end of file in a character\n"));
@@ -403,13 +545,12 @@ blanks [ \t\f]+
yy_pop_state ();
}
- [^\[\]\"\n\r\\]+ YY_OBS_GROW;
- \\(.|\n) YY_OBS_GROW;
- /* FLex wants this rule, in case of a `\<<EOF>>'. */
+ [^\"\[\]\\]+ YY_OBS_GROW;
+ \\{splice}[^\[\]] YY_OBS_GROW;
+ {splice} YY_OBS_GROW;
+ /* Needed for `\<<EOF>>', `\\<<newline>>[', and `\\<<newline>>]'. */
\\ YY_OBS_GROW;
- {eols} YY_OBS_GROW; YY_LINES;
-
<<EOF>> {
LOCATION_PRINT (stderr, *yylloc);
fprintf (stderr, _(": unexpected end of file in a string\n"));
@@ -432,8 +573,8 @@ blanks [ \t\f]+
"\"" YY_OBS_GROW; yy_push_state (SC_STRING);
/* Comments. */
- "/*" YY_OBS_GROW; yy_push_state (SC_COMMENT);
- "//".* YY_OBS_GROW;
+ "/"{splice}"*" YY_OBS_GROW; yy_push_state (SC_COMMENT);
+ "/"{splice}"/" YY_OBS_GROW; yy_push_state (SC_LINE_COMMENT);
/* Not comments. */
"/" YY_OBS_GROW;
@@ -461,15 +602,14 @@ blanks [ \t\f]+
"{" YY_OBS_GROW; braces_level++;
- "$"("<"[^>]+">")?(-?[0-9]+|"$") { handle_dollar (current_braced_code,
+ "$"("<"{tag}">")?(-?[0-9]+|"$") { handle_dollar (current_braced_code,
yytext, *yylloc); }
"@"(-?[0-9]+|"$") { handle_at (current_braced_code,
yytext, *yylloc); }
- address@hidden/\'\"\{\}\n\r]+ YY_OBS_GROW;
- {eols} YY_OBS_GROW; YY_LINES;
+ address@hidden/'\"\{\}]+ YY_OBS_GROW;
- /* A lose $, or /, or etc. */
+ /* A stray $, or /, or etc. */
. YY_OBS_GROW;
<<EOF>> {
@@ -497,9 +637,8 @@ blanks [ \t\f]+
return PROLOGUE;
}
- [^%\[\]/\'\"\n\r]+ YY_OBS_GROW;
+ [^%\[\]/'\"]+ YY_OBS_GROW;
"%" YY_OBS_GROW;
- {eols} YY_OBS_GROW; YY_LINES;
<<EOF>> {
LOCATION_PRINT (stderr, *yylloc);
@@ -514,12 +653,12 @@ blanks [ \t\f]+
/*---------------------------------------------------------------.
| Scanning the epilogue (everything after the second "%%", which |
- | has already been eaten. |
+ | has already been eaten). |
`---------------------------------------------------------------*/
<SC_EPILOGUE>
{
- ([^\[\]]|{eols})+ YY_OBS_GROW;
+ [^\[\]]+ YY_OBS_GROW;
<<EOF>> {
yy_pop_state ();
@@ -568,14 +707,15 @@ handle_action_dollar (char *text, locati
obstack_fgrow1 (&string_obstack,
"]b4_lhs_value([%s])[", type_name);
}
- else if (('0' <= *cp && *cp <= '9') || *cp == '-')
+ else
{
- int n = strtol (cp, &cp, 10);
+ long num;
+ errno = 0;
+ num = strtol (cp, 0, 10);
- if (n > rule_length)
- complain_at (location, _("invalid value: %s%d"), "$", n);
- else
+ if (INT_MIN <= num && num <= rule_length && ! errno)
{
+ int n = num;
if (!type_name && n > 0)
type_name = symbol_list_n_type_name_get (current_rule, location,
n);
@@ -588,16 +728,14 @@ handle_action_dollar (char *text, locati
"]b4_rhs_value([%d], [%d], [%s])[",
rule_length, n, type_name);
}
- }
- else
- {
- complain_at (location, _("%s is invalid"), quote (text));
+ else
+ complain_at (location, _("invalid value: %s"), text);
}
}
/*---------------------------------------------------------------.
-| TEXT is expexted tp be $$ in some code associated to a symbol: |
+| TEXT is expected to be $$ in some code associated to a symbol: |
| destructor or printer. |
`---------------------------------------------------------------*/
@@ -608,7 +746,7 @@ handle_symbol_code_dollar (char *text, l
if (*cp == '$')
obstack_sgrow (&string_obstack, "]b4_dollar_dollar[");
else
- complain_at (location, _("%s is invalid"), quote (text));
+ complain_at (location, _("%s is invalid"), quote_n (1, text));
}
@@ -650,25 +788,26 @@ handle_action_at (char *text, location_t
{
obstack_sgrow (&string_obstack, "]b4_lhs_location[");
}
- else if (('0' <= *cp && *cp <= '9') || *cp == '-')
+ else
{
- int n = strtol (cp, &cp, 10);
+ long num;
+ errno = 0;
+ num = strtol (cp, 0, 10);
- if (n > rule_length)
- complain_at (location, _("invalid value: %s%d"), "@", n);
+ if (INT_MIN <= num && num <= rule_length && ! errno)
+ {
+ int n = num;
+ obstack_fgrow2 (&string_obstack, "]b4_rhs_location([%d], [%d])[",
+ rule_length, n);
+ }
else
- obstack_fgrow2 (&string_obstack, "]b4_rhs_location([%d], [%d])[",
- rule_length, n);
- }
- else
- {
- complain_at (location, _("%s is invalid"), quote (text));
+ complain_at (location, _("invalid value: %s"), text);
}
}
/*---------------------------------------------------------------.
-| TEXT is expexted tp be @$ in some code associated to a symbol: |
+| TEXT is expected to be @$ in some code associated to a symbol: |
| destructor or printer. |
`---------------------------------------------------------------*/
@@ -679,7 +818,7 @@ handle_symbol_code_at (char *text, locat
if (*cp == '$')
obstack_sgrow (&string_obstack, "]b4_at_dollar[");
else
- complain_at (location, _("%s is invalid"), quote (text));
+ complain_at (location, _("%s is invalid"), quote_n (1, text));
}
@@ -703,6 +842,62 @@ handle_at (braced_code_t braced_code_kin
handle_symbol_code_at (text, location);
break;
}
+}
+
+
+/*------------------------------------------------------------------.
+| Convert universal character name UCN to a single-byte character, |
+| and return that character. Return -1 if UCN does not correspond |
+| to a single-byte character. |
+`------------------------------------------------------------------*/
+
+static int
+convert_ucn_to_byte (char const *ucn)
+{
+ unsigned long code = strtoul (ucn + 2, 0, 16);
+
+ /* FIXME: Currently we assume Unicode-compatible unibyte characters
+ on ASCII hosts (i.e., Latin-1 on hosts with 8-bit bytes). On
+ non-ASCII hosts we support only the portable C character set.
+ These limitations should be removed once we add support for
+ multibyte characters. */
+
+ if (UCHAR_MAX < code)
+ return -1;
+
+#if ! ('$' == 0x24 && '@' == 0x40 && '`' == 0x60 && '~' == 0x7e)
+ {
+ /* A non-ASCII host. Use CODE to index into a table of the C
+ basic execution character set, which is guaranteed to exist on
+ all Standard C platforms. This table also includes '$', '@',
+ and '`', which not in the basic execution character set but
+ which are unibyte characters on all the platforms that we know
+ about. */
+ static signed char const table[] =
+ {
+ '\0', -1, -1, -1, -1, -1, -1, '\a',
+ '\b', '\t', '\n', '\v', '\f', '\r', -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1,
+ ' ', '!', '"', '#', '$', '%', '&', '\'',
+ '(', ')', '*', '+', ',', '-', '.', '/',
+ '0', '1', '2', '3', '4', '5', '6', '7',
+ '8', '9', ':', ';', '<', '=', '>', '?',
+ '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G',
+ 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
+ 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
+ 'X', 'Y', 'Z', '[', '\\', ']', '^', '_',
+ '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g',
+ 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
+ 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
+ 'x', 'y', 'z', '{', '|', '}', '~'
+ };
+
+ code = code < sizeof table ? table[code] : -1;
+ }
+#endif
+
+ return code;
}
Index: tests/input.at
===================================================================
RCS file: /cvsroot/bison/bison/tests/input.at,v
retrieving revision 1.12
diff -p -u -r1.12 input.at
--- tests/input.at 14 Oct 2002 08:43:36 -0000 1.12
+++ tests/input.at 3 Nov 2002 08:33:24 -0000
@@ -97,6 +97,22 @@ AT_DATA([input.y],
/* This is seen in GCC: a %{ and %} in middle of a comment. */
const char *foo = "So %{ and %} can be here too.";
+#ifdef __STDC__
+/\
+* A comment with backslash-newlines in it. %{ %} *\
+\
+/
+
+char str[] = "\\
+" A string with backslash-newlines in it %{ %} \\
+"";
+
+char apostrophe = '\\
+\
+'\
+';
+#endif
+
#include <stdio.h>
%}
/* %{ and %} can be here too. */
@@ -128,14 +144,14 @@ static void yyerror (const char *s);
static int yylex (void);
%}
-%type <ival> '1'
+%type <ival> '@<:@'
/* Exercise quotes in strings. */
-%token FAKE "fake @<:@@:>@,"
+%token FAKE "fake @<:@@:>@ \a\b\f\n\r\t\v\"\'\?\\\u005B\U0000005c
??!??'??(??)??-??/??<??=??> \x0\0"
%%
-/* Exercise M4 quoting: '@:>@@:>@', 1. */
-exp: '1'
+/* Exercise M4 quoting: '@:>@@:>@', @<:@, 1. */
+exp: '@<:@' '\1' '\x000000000000000000000000000000000000000000000000002'
{
/* Exercise quotes in braces. */
char tmp[] = "@<:@%c@:>@,\n";
@@ -143,7 +159,7 @@ exp: '1'
}
;
%%
-/* Exercise M4 quoting: '@:>@@:>@', 2. */
+/* Exercise M4 quoting: '@:>@@:>@', @<:@, 2. */
static YYSTYPE
value_t_as_yystype (value_t val)
@@ -156,7 +172,7 @@ value_t_as_yystype (value_t val)
static int
yylex (void)
{
- static const char *input = "1";
+ static const char *input = "@<:@\1\2";
yylval = value_t_as_yystype (*input);
return *input++;
}
@@ -184,7 +200,7 @@ main (void)
AT_CHECK([bison -d -v -o input.c input.y])
AT_COMPILE([input], [input.c main.c])
AT_PARSER_CHECK([./input], 0,
-[[[1],
+[[[@<:@],
]])
AT_CLEANUP
--- /dev/null 2002-11-03 08:20:24.000000000 +0000
+++ lib/mbswidth.c 2001-09-22 14:43:52.000000000 +0000
@@ -0,0 +1,218 @@
+/* Determine the number of screen columns needed for a string.
+ Copyright (C) 2000-2001 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software Foundation,
+ Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
+
+/* Written by Bruno Haible <address@hidden>. */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+/* Specification. */
+#include "mbswidth.h"
+
+/* Get MB_CUR_MAX. */
+#include <stdlib.h>
+
+#include <string.h>
+
+/* Get isprint(). */
+#include <ctype.h>
+
+/* Get mbstate_t, mbrtowc(), mbsinit(), wcwidth(). */
+#if HAVE_WCHAR_H
+# include <wchar.h>
+#endif
+
+/* Get iswprint(), iswcntrl(). */
+#if HAVE_WCTYPE_H
+# include <wctype.h>
+#endif
+#if !defined iswprint && !HAVE_ISWPRINT
+# define iswprint(wc) 1
+#endif
+#if !defined iswcntrl && !HAVE_ISWCNTRL
+# define iswcntrl(wc) 0
+#endif
+
+#ifndef mbsinit
+# if !HAVE_MBSINIT
+# define mbsinit(ps) 1
+# endif
+#endif
+
+#ifndef HAVE_DECL_WCWIDTH
+"this configure-time declaration test was not run"
+#endif
+#if !HAVE_DECL_WCWIDTH
+int wcwidth ();
+#endif
+
+#ifndef wcwidth
+# if !HAVE_WCWIDTH
+/* wcwidth doesn't exist, so assume all printable characters have
+ width 1. */
+# define wcwidth(wc) ((wc) == 0 ? 0 : iswprint (wc) ? 1 : -1)
+# endif
+#endif
+
+/* Get ISPRINT. */
+#if defined (STDC_HEADERS) || (!defined (isascii) && !defined (HAVE_ISASCII))
+# define IN_CTYPE_DOMAIN(c) 1
+#else
+# define IN_CTYPE_DOMAIN(c) isascii(c)
+#endif
+/* Undefine to protect against the definition in wctype.h of solaris2.6. */
+#undef ISPRINT
+#define ISPRINT(c) (IN_CTYPE_DOMAIN (c) && isprint (c))
+#undef ISCNTRL
+#define ISCNTRL(c) (IN_CTYPE_DOMAIN (c) && iscntrl (c))
+
+/* Returns the number of columns needed to represent the multibyte
+ character string pointed to by STRING. If a non-printable character
+ occurs, and MBSW_REJECT_UNPRINTABLE is specified, -1 is returned.
+ With flags = MBSW_REJECT_INVALID | MBSW_REJECT_UNPRINTABLE, this is
+ the multibyte analogon of the wcswidth function. */
+int
+mbswidth (string, flags)
+ const char *string;
+ int flags;
+{
+ return mbsnwidth (string, strlen (string), flags);
+}
+
+/* Returns the number of columns needed to represent the multibyte
+ character string pointed to by STRING of length NBYTES. If a
+ non-printable character occurs, and MBSW_REJECT_UNPRINTABLE is
+ specified, -1 is returned. */
+int
+mbsnwidth (string, nbytes, flags)
+ const char *string;
+ size_t nbytes;
+ int flags;
+{
+ const char *p = string;
+ const char *plimit = p + nbytes;
+ int width;
+
+ width = 0;
+#if HAVE_MBRTOWC
+ if (MB_CUR_MAX > 1)
+ {
+ while (p < plimit)
+ switch (*p)
+ {
+ case ' ': case '!': case '"': case '#': case '%':
+ case '&': case '\'': case '(': case ')': case '*':
+ case '+': case ',': case '-': case '.': case '/':
+ case '0': case '1': case '2': case '3': case '4':
+ case '5': case '6': case '7': case '8': case '9':
+ case ':': case ';': case '<': case '=': case '>':
+ case '?':
+ case 'A': case 'B': case 'C': case 'D': case 'E':
+ case 'F': case 'G': case 'H': case 'I': case 'J':
+ case 'K': case 'L': case 'M': case 'N': case 'O':
+ case 'P': case 'Q': case 'R': case 'S': case 'T':
+ case 'U': case 'V': case 'W': case 'X': case 'Y':
+ case 'Z':
+ case '[': case '\\': case ']': case '^': case '_':
+ case 'a': case 'b': case 'c': case 'd': case 'e':
+ case 'f': case 'g': case 'h': case 'i': case 'j':
+ case 'k': case 'l': case 'm': case 'n': case 'o':
+ case 'p': case 'q': case 'r': case 's': case 't':
+ case 'u': case 'v': case 'w': case 'x': case 'y':
+ case 'z': case '{': case '|': case '}': case '~':
+ /* These characters are printable ASCII characters. */
+ p++;
+ width++;
+ break;
+ default:
+ /* If we have a multibyte sequence, scan it up to its end. */
+ {
+ mbstate_t mbstate;
+ memset (&mbstate, 0, sizeof mbstate);
+ do
+ {
+ wchar_t wc;
+ size_t bytes;
+ int w;
+
+ bytes = mbrtowc (&wc, p, plimit - p, &mbstate);
+
+ if (bytes == (size_t) -1)
+ /* An invalid multibyte sequence was encountered. */
+ {
+ if (!(flags & MBSW_REJECT_INVALID))
+ {
+ p++;
+ width++;
+ break;
+ }
+ else
+ return -1;
+ }
+
+ if (bytes == (size_t) -2)
+ /* An incomplete multibyte character at the end. */
+ {
+ if (!(flags & MBSW_REJECT_INVALID))
+ {
+ p = plimit;
+ width++;
+ break;
+ }
+ else
+ return -1;
+ }
+
+ if (bytes == 0)
+ /* A null wide character was encountered. */
+ bytes = 1;
+
+ w = wcwidth (wc);
+ if (w >= 0)
+ /* A printable multibyte character. */
+ width += w;
+ else
+ /* An unprintable multibyte character. */
+ if (!(flags & MBSW_REJECT_UNPRINTABLE))
+ width += (iswcntrl (wc) ? 0 : 1);
+ else
+ return -1;
+
+ p += bytes;
+ }
+ while (! mbsinit (&mbstate));
+ }
+ break;
+ }
+ return width;
+ }
+#endif
+
+ while (p < plimit)
+ {
+ unsigned char c = (unsigned char) *p++;
+
+ if (ISPRINT (c))
+ width++;
+ else if (!(flags & MBSW_REJECT_UNPRINTABLE))
+ width += (ISCNTRL (c) ? 0 : 1);
+ else
+ return -1;
+ }
+ return width;
+}
--- /dev/null 2002-11-03 08:20:24.000000000 +0000
+++ lib/mbswidth.h 2001-11-10 00:13:19.000000000 +0000
@@ -0,0 +1,45 @@
+/* Determine the number of screen columns needed for a string.
+ Copyright (C) 2000-2001 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software Foundation,
+ Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
+
+#include <stddef.h>
+
+#ifndef PARAMS
+# if __STDC__ || defined __GNUC__ || defined __SUNPRO_C || defined __cplusplus
|| __PROTOTYPES
+# define PARAMS(Args) Args
+# else
+# define PARAMS(Args) ()
+# endif
+#endif
+
+/* Optional flags to influence mbswidth/mbsnwidth behavior. */
+
+/* If this bit is set, return -1 upon finding an invalid or incomplete
+ character. Otherwise, assume invalid characters have width 1. */
+#define MBSW_REJECT_INVALID 1
+
+/* If this bit is set, return -1 upon finding a non-printable character.
+ Otherwise, assume unprintable characters have width 0 if they are
+ control characters and 1 otherwise. */
+#define MBSW_REJECT_UNPRINTABLE 2
+
+/* Returns the number of screen columns needed for STRING. */
+#define mbswidth gnu_mbswidth /* avoid clash with UnixWare 7.1.1 function */
+extern int mbswidth PARAMS ((const char *string, int flags));
+
+/* Returns the number of screen columns needed for the NBYTES bytes
+ starting at BUF. */
+extern int mbsnwidth PARAMS ((const char *buf, size_t nbytes, int flags));
--- /dev/null 2002-11-03 08:20:24.000000000 +0000
+++ m4/mbswidth.m4 2002-06-21 17:41:02.000000000 +0000
@@ -0,0 +1,36 @@
+#serial 7
+
+dnl autoconf tests required for use of mbswidth.c
+dnl From Bruno Haible.
+
+AC_DEFUN([jm_PREREQ_MBSWIDTH],
+[
+ AC_REQUIRE([AC_HEADER_STDC])
+ AC_CHECK_HEADERS(limits.h stdlib.h string.h wchar.h wctype.h)
+ AC_CHECK_FUNCS(isascii iswcntrl iswprint mbsinit wcwidth)
+ jm_FUNC_MBRTOWC
+
+ AC_CACHE_CHECK([whether wcwidth is declared], ac_cv_have_decl_wcwidth,
+ [AC_TRY_COMPILE([
+/* AIX 3.2.5 declares wcwidth in <string.h>. */
+#if HAVE_STRING_H
+# include <string.h>
+#endif
+#if HAVE_WCHAR_H
+# include <wchar.h>
+#endif
+], [
+#ifndef wcwidth
+ char *p = (char *) wcwidth;
+#endif
+], ac_cv_have_decl_wcwidth=yes, ac_cv_have_decl_wcwidth=no)])
+ if test $ac_cv_have_decl_wcwidth = yes; then
+ ac_val=1
+ else
+ ac_val=0
+ fi
+ AC_DEFINE_UNQUOTED(HAVE_DECL_WCWIDTH, $ac_val,
+ [Define to 1 if you have the declaration of wcwidth(), and to 0
otherwise.])
+
+ AC_TYPE_MBSTATE_T
+])
- Bison scanner patch to fix POSIX incompatibilities, etc.,
Paul Eggert <=
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Akim Demaille, 2002/11/04
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Paul Eggert, 2002/11/04
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Akim Demaille, 2002/11/05
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Paul Eggert, 2002/11/05
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Akim Demaille, 2002/11/06
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Paul Eggert, 2002/11/06
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Akim Demaille, 2002/11/07
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Paul Eggert, 2002/11/05
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Paul Eggert, 2002/11/06
- Re: Bison scanner patch to fix POSIX incompatibilities, etc., Akim Demaille, 2002/11/06