m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch-1_4 patsubst replacement bug


From: Eric Blake
Subject: branch-1_4 patsubst replacement bug
Date: Fri, 14 Jul 2006 14:38:43 -0600
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

$ echo 'patsubst(abc,b,\)' | m4
a

Oops - we lost the c.

Also, it almost feels like an arbitrary limit that we can only handle 9
sub-expressions.  Then again, even sed can only replace the first 9
sub-expressions, and the regex engine only allows 9 back-references within
the regexp, so it's probably not worth worrying about.

2006-07-14  Eric Blake  <address@hidden>

        * src/builtin.c (substitute): Warn on bad escape sequences.
        Ignore trailing backslash.
        * doc/m4.texinfo (Regexp): Add documentation for this.
        * NEWS: Document this change.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEuADT84KuGfSFAYARAt5OAJ9sfcKFR86BQP3qNixp44IUHhYNhwCfavW/
pQ/lo+5QrIfIO0vmIdlOMoo=
=xgoc
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.37
diff -u -p -r1.1.1.1.2.37 NEWS
--- NEWS        13 Jul 2006 22:09:54 -0000      1.1.1.1.2.37
+++ NEWS        14 Jul 2006 20:21:23 -0000
@@ -47,6 +47,9 @@ Version 1.4.5 - ?? 2006, by ???  (CVS ve
 * The popdef and undefine macros now correctly accept multiple arguments.
 * Although changeword is on its last leg, if enabled, it now reverts to the
   default (faster) regexp when passed the empty string.
+* The regexp and substr macros now warn and ignore a trailing backslash in
+  the replacement, and warn on \n for n larger than the number of
+  sub-expressions in the regexp.
 
 Version 1.4.4b - 17 June 2006, by Eric Blake  (CVS version 1.4.4a)
 
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.42
diff -u -p -r1.1.1.1.2.42 m4.texinfo
--- doc/m4.texinfo      14 Jul 2006 15:15:58 -0000      1.1.1.1.2.42
+++ doc/m4.texinfo      14 Jul 2006 20:21:24 -0000
@@ -2206,7 +2206,7 @@ foo
 @error{}m4trace:8: -1- foo
 @result{}FOO
 @end example
- 
+
 @node Debug Output
 @section Saving debugging output
 
@@ -3115,25 +3115,40 @@ If @var{replacement} is omitted, @code{r
 the first match of @var{regexp} in @var{string}.  If @var{regexp} does
 not match anywhere in @var{string}, it expands to -1.
 
+If @var{replacement} is supplied, and there was a match, @code{regexp}
+changes the expansion to this argument, with @address@hidden substituted
+by the text matched by the @var{n}th parenthesized sub-expression of
address@hidden, up to nine sub-expressions.  The escape @samp{\&} is
+replaced by the text of the entire regular expression matched.  For
+all other characters, @samp{\} treats the next character literally.  A
+warning is issued if there were fewer sub-expressions than the
address@hidden@var{n}} requested, or if there is a trailing @samp{\}.  If there
+was no match, @code{regexp} expands to the empty string.
+
+The builtin macro @code{regexp} is recognized only when given arguments.
+
 @example
 regexp(`GNUs not Unix', `\<[a-z]\w+')
 @result{}5
 regexp(`GNUs not Unix', `\<Q\w*')
 @result{}-1
+regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
address@hidden Unix *** nix ***
+regexp(`GNUs not Unix', `\<Q\w*', `*** \& *** \1 ***')
address@hidden
 @end example
 
-If @var{replacement} is supplied, @code{regexp} changes the expansion
-to this argument, with @address@hidden substituted by the text
-matched by the @var{n}th parenthesized sub-expression of @var{regexp},
address@hidden&} being the text the entire regular expression matched.
+Here are some more examples on the handling of backslash:
 
 @example
-regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
address@hidden Unix *** nix ***
+regexp(`abc', `\(b\)', `\\\10\a')
address@hidden
+regexp(`abc', `b', `\1\')
address@hidden:2: m4: Warning: sub-expression 1 not present
address@hidden:2: m4: Warning: trailing \ ignored in replacement
address@hidden
 @end example
 
-The builtin macro @code{regexp} is recognized only when given arguments.
-
 @node Substr
 @section Extracting substrings
 
@@ -3241,12 +3256,19 @@ to avoid infinite loops.
 
 When a replacement is to be made, @var{replacement} is inserted into
 the expansion, with @address@hidden substituted by the text matched by
-the @var{n}th parenthesized sub-expression of @var{regexp}, @samp{\&}
-being the text the entire regular expression matched.
+the @var{n}th parenthesized sub-expression of @var{patsubst}, for up to
+nine sub-expressions.  The escape @samp{\&} is replaced by the text of
+the entire regular expression matched.  For all other characters,
address@hidden treats the next character literally.  A warning is issued if
+there were fewer sub-expressions than the @address@hidden requested, or
+if there is a trailing @samp{\}.
 
 The @var{replacement} argument can be omitted, in which case the text
 matched by @var{regexp} is deleted.
 
+The builtin macro @code{patsubst} is recognized only when given
+arguments.
+
 @example
 patsubst(`GNUs not Unix', `^', `OBS: ')
 @result{}OBS: GNUs not Unix
@@ -3258,6 +3280,9 @@ patsubst(`GNUs not Unix', `\w+', `(\&)')
 @result{}(GNUs) (not) (Unix)
 patsubst(`GNUs not Unix', `[A-Z][a-z]+')
 @result{}GN not @comment
+patsubst(`GNUs not Unix', `not', `NOT\')
address@hidden:6: m4: Warning: trailing \ ignored in replacement
address@hidden NOT Unix
 @end example
 
 Here is a slightly more realistic example, which capitalizes individual
@@ -3276,8 +3301,21 @@ capitalize(`GNUs not Unix')
 @result{}Gnus Not Unix
 @end example
 
-The builtin macro @code{patsubst} is recognized only when given
-arguments.
+While @code{regexp} replaces the whole input with the replacement as
+soon as there is a match, @code{patsubst} replaces each
address@hidden of a match and preserves non matching pieces:
+
address@hidden
+define(`patreg',
+`patsubst($@@)
+regexp($@@)')dnl
+patreg(`bar foo baz Foo', `foo\|Foo', `FOO')
address@hidden FOO baz FOO
address@hidden
+patreg(`aba abb 121', `\(.\)\(.\)\1', `\2\1\2')
address@hidden abb 212
address@hidden
address@hidden example
 
 @node Format
 @section Formatted output
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.24
diff -u -p -r1.1.1.1.2.24 builtin.c
--- src/builtin.c       14 Jul 2006 15:15:58 -0000      1.1.1.1.2.24
+++ src/builtin.c       14 Jul 2006 20:21:24 -0000
@@ -1649,6 +1649,14 @@ Warning: \\0 will disappear, use \\& ins
          if (regs->end[ch] > 0)
            obstack_grow (obs, victim + regs->start[ch],
                          regs->end[ch] - regs->start[ch]);
+         else
+           M4ERROR ((warning_status, 0, "\
+Warning: sub-expression %d not present", ch));
+         break;
+
+       case '\0':
+         M4ERROR ((warning_status, 0, "\
+Warning: trailing \\ ignored in replacement"));
          break;
 
        default:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]