m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch-1_4 debian bug 311378 - 8-bit quotes


From: Eric Blake
Subject: branch-1_4 debian bug 311378 - 8-bit quotes
Date: Mon, 31 Jul 2006 20:53:16 -0600
User-agent: Thunderbird 1.5.0.5 (Windows/20060719)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=311378 complains:

$ m4 > samp1 <<EOF
> changequote(«,»)dnl
> define(a,b)dnl
> «a»
> EOF
«b»

And indeed, on platforms where char is signed, we had some sign extension
bugs, since we were comparing getc()'s unsigned chars vs a char*.  With
this patch, m4 should now be 8-bit clean; I went the path of always using
unsigned char in the parser.

Unfortunately, I don't know any good way to put an example of 8-bit
characters in the documentation.  Info will faithfully reproduce literal
characters (but it may render horribly depending on your local), while TeX
ignores 8-bit characters and needs a command for a glyph.  So for now, I
left the examples in an @ignore block, so at least the testsuite will
ensure we don't regress.

2006-07-31  Eric Blake  <address@hidden>

        * src/input.c (peek_input, next_char, match_input): Be eight-bit
        clean; fixes debian bug 311378.
        * doc/m4.texinfo (Syntax): Describe eight-bit handling.
        (Changequote, Changecom): Add examples to test this.
        * NEWS: Document this fix.
        * THANKS: Update.
        Reported by Steven Augart.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEzsIb84KuGfSFAYARAjV6AKC4F7Y2rpNKr8LzY8Murz2fnAy01gCfY4pv
adcorwShehrSo21KhbyPdvg=
=dSGW
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.45
diff -u -p -r1.1.1.1.2.45 NEWS
--- NEWS        30 Jul 2006 03:18:12 -0000      1.1.1.1.2.45
+++ NEWS        1 Aug 2006 02:44:33 -0000
@@ -20,6 +20,7 @@ Version 1.4.6 - ?? 2006, by ??  (CVS ver
 * The __file__ macro, and the -s/--synclines option, now show what
   directory a file was found in when the -I/--include option or M4PATH
   variable had an effect.
+* The changequote and changecom macros now work with 8-bit characters.
 
 Version 1.4.5 - 15 July 2006, by Eric Blake  (CVS version 1.4.4c)
 
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.13
diff -u -p -r1.1.1.1.2.13 input.c
--- src/input.c 30 Jul 2006 23:46:51 -0000      1.1.1.1.2.13
+++ src/input.c 1 Aug 2006 02:44:33 -0000
@@ -397,7 +397,7 @@ init_macro_token (token_data *td)
 int
 peek_input (void)
 {
-  register int ch;
+  int ch;
 
   while (1)
     {
@@ -407,7 +407,7 @@ peek_input (void)
       switch (isp->type)
        {
        case INPUT_STRING:
-         ch = isp->u.u_s.string[0];
+         ch = to_uchar (isp->u.u_s.string[0]);
          if (ch != '\0')
            return ch;
          break;
@@ -446,13 +446,13 @@ peek_input (void)
 
 #define next_char() \
   (isp && isp->type == INPUT_STRING && isp->u.u_s.string[0]            \
-   ? *isp->u.u_s.string++                                              \
+   ? to_uchar (*isp->u.u_s.string++)                                   \
    : next_char_1 ())
 
 static int
 next_char_1 (void)
 {
-  register int ch;
+  int ch;
 
   if (start_of_input_line)
     {
@@ -468,7 +468,7 @@ next_char_1 (void)
       switch (isp->type)
        {
        case INPUT_STRING:
-         ch = *isp->u.u_s.string++;
+         ch = to_uchar (*isp->u.u_s.string++);
          if (ch != '\0')
            return ch;
          break;
@@ -531,14 +531,14 @@ match_input (const char *s)
   const char *t;
 
   ch = peek_input ();
-  if (ch != *s)
+  if (ch != to_uchar (*s))
     return 0;                  /* fail */
   (void) next_char ();
 
   if (s[1] == '\0')
     return 1;                  /* short match */
 
-  for (n = 1, t = s++; (ch = peek_input ()) == *s++; n++)
+  for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); n++)
     {
       (void) next_char ();
       if (*s == '\0')          /* long match */
@@ -564,9 +564,9 @@ match_input (const char *s)
 `------------------------------------------------------------------------*/
 
 #define MATCH(ch, s) \
-  ((s)[0] == (ch) \
-   && (ch) != '\0' \
-   && ((s)[1] == '\0' \
+  (to_uchar ((s)[0]) == (ch)                                            \
+   && (ch) != '\0'                                                      \
+   && ((s)[1] == '\0'                                                   \
        || (match_input ((s) + 1) ? (ch) = peek_input (), 1 : 0)))
 
 
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.57
diff -u -p -r1.1.1.1.2.57 m4.texinfo
--- doc/m4.texinfo      31 Jul 2006 20:28:12 -0000      1.1.1.1.2.57
+++ doc/m4.texinfo      1 Aug 2006 02:44:34 -0000
@@ -698,8 +698,12 @@ primitive is spelled within @code{m4}.
 As @code{m4} reads its input, it separates it into @dfn{tokens}.  A
 token is either a name, a quoted string, or any single character, that
 is not a part of either a name or a string.  Input to @code{m4} can also
-contain comments.  @acronym{GNU} @code{m4} does not yet understand locales; all
-operations are byte-oriented rather than character-oriented.
+contain comments.  @acronym{GNU} @code{m4} does not yet understand
+locales; all operations are byte-oriented rather than
+character-oriented.  However, @code{m4} is eight-bit clean, so you can
+use non-ASCII characters in quoted strings (@pxref{Changequote}),
+comments (@pxref{Changecom}), and macro names (@pxref{Indir}), with the
+exception of the NUL character (the zero byte).
 
 @menu
 * Names::                       Macro names
@@ -2344,6 +2348,23 @@ foo
 @result{}Macro foo.
 @end example
 
+The quotation strings can safely contain eight-bit characters.
address@hidden
+Yuck.  I know of no clean way to render an 8-bit character in both info
+and dvi.  This example uses the `open-guillemot' and `close-guillemot'
+characters of the Latin-1 character set.
+
address@hidden
+define(`a', `b')
address@hidden
+«a»
address@hidden
+changequote(`«', `»')
address@hidden
+«a»
address@hidden
address@hidden example
address@hidden ignore
 If no single character is appropriate, @var{start} and @var{end} can be
 of any length.
 
@@ -2380,10 +2401,10 @@ calls of @code{changequote} must be made
 and one for the new quotes.
 
 Macros are recognized in preference to the begin-quote string, so if a
-prefix of @var{start} can be recognized as a macro name, the quoting
-mechanism is effectively disabled.  Unless you use @code{changeword}
-(@pxref{Changeword}), this means that @var{start} should not begin with
-a letter or @samp{_} (underscore).
+prefix of @var{start} can be recognized as a potential macro name, the
+quoting mechanism is effectively disabled.  Unless you use
address@hidden (@pxref{Changeword}), this means that @var{start}
+should not begin with a letter or @samp{_} (underscore).
 
 @example
 define(`hi', `HI')
@@ -2490,11 +2511,29 @@ changecom(`#')
 @result{}# comment again
 @end example
 
+The comment strings can safely contain eight-bit characters.
address@hidden
+Yuck.  I know of no clean way to render an 8-bit character in both info
+and dvi.  This example uses the `open-guillemot' and `close-guillemot'
+characters of the Latin-1 character set.
+
address@hidden
+define(`a', `b')
address@hidden
+«a»
address@hidden
+changecom(`«', `»')
address@hidden
+«a»
address@hidden
address@hidden example
address@hidden ignore
+
 Comments are recognized in preference to macros.  However, this is not
 compatible with other implementations, where macros take precedence over
 comments, so it may change in a future release.  For portability, this
-means that @var{start} should not have a prefix that begins with a
-letter or @samp{_} (underscore).
+means that @var{start} should not begin with a letter or @samp{_}
+(underscore).
 
 @example
 define(`hi', `HI')
@@ -4646,6 +4685,7 @@ the first time.
 @bye
 
 @c Local Variables:
address@hidden coding: ISO-8859-1
 @c fill-column: 72
 @c ispell-local-dictionary: "american"
 @c indent-tabs-mode: nil

reply via email to

[Prev in Thread] Current Thread [Next in Thread]