m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[15/18] argv_ref speedup: collect $@ reference in one go


From: Eric Blake
Subject: [15/18] argv_ref speedup: collect $@ reference in one go
Date: Sat, 16 Feb 2008 06:56:20 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Next in the series.  The big change here was teaching the input engine how
to handle an entire $@ series as a single token, which collect_arguments
then drops in place occupying a single array slot but conceptually
multiple arguments.  All the accessors through the macro_arguments opaque
type then simply drill through the nested $@ reference as appropriate.  As
a result, a recursive macro occupies a lot less memory when it shares
arguments from a prior macro invocation.  This patch gives dramatic
improvements to both memory and execution speed on unboxed recursion (it
is much faster to pass the arguments in one go than to visit them one at a
time); however, it still remains quadratic in both memory and in speed
since n iterations of a recursive algorithm build an argv struct with n
nested $@ references to drill through, and since $@ inside quoted contexts
are still flattened in place rather than shared.  I also added a few
testsuite improvements.

2008-02-16  Eric Blake  <address@hidden>

        Stage 15: return argv refs back to collect_arguments.
        Collect an entire $@ reference at once rather than one argument at
        a time, outside of quotes (but inside quotes, $@ is still
        flattened for now).  The skip_last field allows concatenation of
        $@ with other text when collecting arguments.
        Memory impact: noticeable improvement, due to better reuse of 
address@hidden
        Speed impact: noticeable improvement, due to less parsing.
        * src/m4.h (enum token_type): Add TOKEN_ARGV.
        (struct token_chain): Add skip_last member to argv link.
        (next_token): Update prototype.
        * src/input.c (CHAR_ARGV): New placeholder input character.
        (peek_input): Add parameter, to pass $@ at once.
        (next_char_1, append_quote_token): Handle $@ inside quotes.
        (init_argv_token): New function.
        (push_token, match_input, next_token, peek_token, lex_debug):
        Update callers.
        * src/macro.c (expand_input, collect_arguments): Likewise.
        (expand_argument): Handle incoming $@ token.
        (arg_adjust_refcount, arg_token, arg_text, make_argv_ref_token):
        Handle nested $@ refs.
        * src/symtab.c (symtab_debug): Update caller.
        * examples/null.m4: Document more tests that are needed.  Add
        tests for NUL with divert, patsubst, and regexp.
        * examples/null.out: Update for new tests.
        * doc/m4.texinfo (Syntax): Add test for m4exit and NUL.
        * checks/get-them (AWK): Give a default value.
        * checks/check-them: Allow tests to invoke child processes with
        same include path.  Perform message normalization on stderr.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHtuuE84KuGfSFAYARAov4AJ4nmHozA+Sf/+yt+0MLcQR4GsMfKgCfcAis
6QjWQVAEHR4dwGbo22GU2ZU=
=IvaX
-----END PGP SIGNATURE-----
From 5d083e4726ed578b093aaf82e2a5a542f5815dcd Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 15 Feb 2008 22:12:03 -0700
Subject: [PATCH] Stage 15: return argv refs back to collect_arguments.

* m4/m4private.h (CHAR_ARGV): New input engine sentinel.
(enum m4__token_type): Add M4_TOKEN_ARGV.
(struct m4__symbol_chain): Add skip_last member to argv link.
(m4__next_token): Add parameter.
* m4/input.c (peek_char, file_peek, builtin_peek, string_peek)
(composite_peek, m4__next_token): Add new parameter.
(composite_read, append_quote_token): Support argv in quotes.
(init_argv_symbol): New function.
(m4__push_symbol, match_input, consume_syntax)
(m4__next_token_is_open, m4_print_token): Adjust callers.
* m4/macro.c (m4_macro_expand_input, m4__arg_adjust_refcount)
(arg_mark, m4_arg_text, make_argv_ref): Likewise.
(expand_argument, collect_arguments): Handle new token.
(arg_symbol): Drill through $@ reference.
* m4/syntax.c (set_quote_age): Detect disabled comments.
* m4/symtab.c (dump_symbol_CB) [DEBUG_SYM]: Fix debug code.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   26 +++++++
 m4/input.c     |  207 +++++++++++++++++++++++++++++++++++++++++---------------
 m4/m4private.h |    9 ++-
 m4/macro.c     |  114 ++++++++++++++++++++++++-------
 m4/symtab.c    |    2 +-
 m4/syntax.c    |    8 ++-
 6 files changed, 277 insertions(+), 89 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 0e3c93e..d0ba987 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,29 @@
+2008-02-16  Eric Blake  <address@hidden>
+
+       Stage 15: return argv refs back to collect_arguments.
+       Collect an entire $@ reference at once rather than one argument at
+       a time, outside of quotes (but inside quotes, $@ is still
+       flattened for now).  The skip_last field allows concatenation of
+       $@ with other text when collecting arguments.
+       Memory impact: noticeable improvement, due to better reuse of 
address@hidden
+       Speed impact: noticeable improvement, due to less parsing.
+       * m4/m4private.h (CHAR_ARGV): New input engine sentinel.
+       (enum m4__token_type): Add M4_TOKEN_ARGV.
+       (struct m4__symbol_chain): Add skip_last member to argv link.
+       (m4__next_token): Add parameter.
+       * m4/input.c (peek_char, file_peek, builtin_peek, string_peek)
+       (composite_peek, m4__next_token): Add new parameter.
+       (composite_read, append_quote_token): Support argv in quotes.
+       (init_argv_symbol): New function.
+       (m4__push_symbol, match_input, consume_syntax)
+       (m4__next_token_is_open, m4_print_token): Adjust callers.
+       * m4/macro.c (m4_macro_expand_input, m4__arg_adjust_refcount)
+       (arg_mark, m4_arg_text, make_argv_ref): Likewise.
+       (expand_argument, collect_arguments): Handle new token.
+       (arg_symbol): Drill through $@ reference.
+       * m4/syntax.c (set_quote_age): Detect disabled comments.
+       * m4/symtab.c (dump_symbol_CB) [DEBUG_SYM]: Fix debug code.
+
 2008-02-15  Eric Blake  <address@hidden>
 
        * modules/gnu.c (regexp_compile): Use a fastmap for regex speed.
diff --git a/m4/input.c b/m4/input.c
index 025ae0d..b5d50a1 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -92,20 +92,20 @@
    maintains its own notion of the current file and line, so swapping
    between input blocks must update the context accordingly.  */
 
-static int     file_peek               (m4_input_block *, m4 *);
+static int     file_peek               (m4_input_block *, m4 *, bool);
 static int     file_read               (m4_input_block *, m4 *, bool, bool);
 static void    file_unget              (m4_input_block *, int);
 static bool    file_clean              (m4_input_block *, m4 *, bool);
 static void    file_print              (m4_input_block *, m4 *, m4_obstack *);
-static int     builtin_peek            (m4_input_block *, m4 *);
+static int     builtin_peek            (m4_input_block *, m4 *, bool);
 static int     builtin_read            (m4_input_block *, m4 *, bool, bool);
 static void    builtin_unget           (m4_input_block *, int);
 static void    builtin_print           (m4_input_block *, m4 *, m4_obstack *);
-static int     string_peek             (m4_input_block *, m4 *);
+static int     string_peek             (m4_input_block *, m4 *, bool);
 static int     string_read             (m4_input_block *, m4 *, bool, bool);
 static void    string_unget            (m4_input_block *, int);
 static void    string_print            (m4_input_block *, m4 *, m4_obstack *);
-static int     composite_peek          (m4_input_block *, m4 *);
+static int     composite_peek          (m4_input_block *, m4 *, bool);
 static int     composite_read          (m4_input_block *, m4 *, bool, bool);
 static void    composite_unget         (m4_input_block *, int);
 static bool    composite_clean         (m4_input_block *, m4 *, bool);
@@ -116,12 +116,14 @@ static    void    append_quote_token      (m4 *, 
m4_obstack *,
                                         m4_symbol_value *);
 static bool    match_input             (m4 *, const char *, bool);
 static int     next_char               (m4 *, bool, bool);
-static int     peek_char               (m4 *);
+static int     peek_char               (m4 *, bool);
 static bool    pop_input               (m4 *, bool);
 static void    unget_input             (int);
 static bool    consume_syntax          (m4 *, m4_obstack *, unsigned int);
 
 #ifdef DEBUG_INPUT
+# include "quotearg.h"
+
 static int m4_print_token (const char *, m4__token_type, m4_symbol_value *);
 #endif
 
@@ -129,8 +131,9 @@ static int m4_print_token (const char *, m4__token_type, 
m4_symbol_value *);
 struct input_funcs
 {
   /* Peek at input, return an unsigned char, CHAR_BUILTIN if it is a
-     builtin, or CHAR_RETRY if none available.  */
-  int  (*peek_func)    (m4_input_block *, m4 *);
+     builtin, or CHAR_RETRY if none available.  If ALLOW_ARGV, then
+     CHAR_ARGV may be returned.  */
+   int (*peek_func)    (m4_input_block *, m4 *, bool);
 
   /* Read input, return an unsigned char, CHAR_BUILTIN if it is a
      builtin, or CHAR_RETRY if none available.  If ALLOW_QUOTE, then
@@ -254,7 +257,8 @@ static struct input_funcs composite_funcs = {
 
 /* Input files, from command line or [s]include.  */
 static int
-file_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED)
+file_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
+          bool allow_argv M4_GNUC_UNUSED)
 {
   int ch;
 
@@ -389,7 +393,8 @@ m4_push_file (m4 *context, FILE *fp, const char *title, 
bool close_file)
 
 /* Handle a builtin macro token.  */
 static int
-builtin_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED)
+builtin_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
+             bool allow_argv M4_GNUC_UNUSED)
 {
   if (me->u.u_b.read)
     return CHAR_RETRY;
@@ -474,7 +479,8 @@ m4_push_builtin (m4 *context, m4_symbol_value *token)
 
 /* Handle string expansion text.  */
 static int
-string_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED)
+string_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
+            bool allow_argv M4_GNUC_UNUSED)
 {
   return me->u.u_s.len ? to_uchar (*me->u.u_s.str) : CHAR_RETRY;
 }
@@ -662,7 +668,7 @@ m4__push_symbol (m4 *context, m4_symbol_value *value, 
size_t level, bool inuse)
       next->u.u_c.end = chain;
       if (chain->type == M4__CHAIN_ARGV)
        {
-         assert (!chain->u.u_a.comma);
+         assert (!chain->u.u_a.comma && !chain->u.u_a.skip_last);
          inuse |= m4__arg_adjust_refcount (context, chain->u.u_a.argv, true);
        }
       else if (chain->type == M4__CHAIN_STR && chain->u.u_s.level < SIZE_MAX)
@@ -718,9 +724,11 @@ m4_push_string_finish (void)
    in FIFO order, even though the obstack allocates memory in LIFO
    order.  */
 static int
-composite_peek (m4_input_block *me, m4 *context)
+composite_peek (m4_input_block *me, m4 *context, bool allow_argv)
 {
   m4__symbol_chain *chain = me->u.u_c.chain;
+  size_t argc;
+
   while (chain)
     {
       switch (chain->type)
@@ -730,12 +738,16 @@ composite_peek (m4_input_block *me, m4 *context)
            return to_uchar (chain->u.u_s.str[0]);
          break;
        case M4__CHAIN_ARGV:
-         /* TODO - figure out how to pass multiple arguments to
-            macro.c at once.  */
-         if (chain->u.u_a.index == m4_arg_argc (chain->u.u_a.argv))
+         argc = m4_arg_argc (chain->u.u_a.argv);
+         if (chain->u.u_a.index == argc)
            break;
          if (chain->u.u_a.comma)
            return ','; /* FIXME - support M4_SYNTAX_COMMA.  */
+         /* Only return a reference in the quoting is correct and the
+            reference has more than one argument left.  */
+         if (allow_argv && chain->quote_age == m4__quote_age (M4SYNTAX)
+             && chain->u.u_a.quotes && chain->u.u_a.index + 1 < argc)
+           return CHAR_ARGV;
          /* Rather than directly parse argv here, we push another
             input block containing the next unparsed argument from
             argv.  */
@@ -745,7 +757,7 @@ composite_peek (m4_input_block *me, m4 *context)
          chain->u.u_a.index++;
          chain->u.u_a.comma = true;
          m4_push_string_finish ();
-         return peek_char (context);
+         return peek_char (context, allow_argv);
        default:
          assert (!"composite_peek");
          abort ();
@@ -761,9 +773,7 @@ composite_read (m4_input_block *me, m4 *context, bool 
allow_quote, bool safe)
   m4__symbol_chain *chain = me->u.u_c.chain;
   while (chain)
     {
-      /* TODO also support returning $@ as CHAR_QUOTE.  */
-      if (allow_quote && chain->quote_age == m4__quote_age (M4SYNTAX)
-         && chain->type == M4__CHAIN_STR)
+      if (allow_quote && chain->quote_age == m4__quote_age (M4SYNTAX))
        return CHAR_QUOTE;
       switch (chain->type)
        {
@@ -779,8 +789,6 @@ composite_read (m4_input_block *me, m4 *context, bool 
allow_quote, bool safe)
            m4__adjust_refcount (context, chain->u.u_s.level, false);
          break;
        case M4__CHAIN_ARGV:
-         /* TODO - figure out how to pass multiple arguments to
-            macro.c at once.  */
          if (chain->u.u_a.index == m4_arg_argc (chain->u.u_a.argv))
            {
              m4__arg_adjust_refcount (context, chain->u.u_a.argv, false);
@@ -996,7 +1004,7 @@ pop_input (m4 *context, bool cleanup)
   assert (isp);
   if (isp->funcs->clean_func
       ? !isp->funcs->clean_func (isp, context, cleanup)
-      : (isp->funcs->peek_func (isp, context) != CHAR_RETRY))
+      : (isp->funcs->peek_func (isp, context, true) != CHAR_RETRY))
     return false;
 
   if (tmp != NULL)
@@ -1073,18 +1081,28 @@ append_quote_token (m4 *context, m4_obstack *obs, 
m4_symbol_value *value)
 {
   m4__symbol_chain *src_chain = isp->u.u_c.chain;
   m4__symbol_chain *chain;
-  assert (isp->funcs == &composite_funcs && obs && m4__quote_age (M4SYNTAX)
-         && src_chain->type == M4__CHAIN_STR
-         && src_chain->u.u_s.level <= SIZE_MAX);
+  assert (isp->funcs == &composite_funcs && obs && m4__quote_age (M4SYNTAX));
   isp->u.u_c.chain = src_chain->next;
 
   /* Speed consideration - for short enough symbols, the speed and
      memory overhead of parsing another INPUT_CHAIN link outweighs the
      time to inline the symbol text.  */
-  if (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD)
+  if (src_chain->type == M4__CHAIN_STR
+      && src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD)
     {
+      assert (src_chain->u.u_s.level <= SIZE_MAX);
       obstack_grow (obs, src_chain->u.u_s.str, src_chain->u.u_s.len);
       m4__adjust_refcount (context, src_chain->u.u_s.level, false);
+      return;
+    }
+
+  /* TODO preserve $@ through quotes.  */
+  if (src_chain->type == M4__CHAIN_ARGV)
+    {
+      m4_arg_print (obs, src_chain->u.u_a.argv, src_chain->u.u_a.index,
+                   src_chain->u.u_a.quotes, NULL, false);
+      m4__arg_adjust_refcount (context, src_chain->u.u_a.argv, false);
+      return;
     }
 
   if (value->type == M4_SYMBOL_VOID)
@@ -1103,6 +1121,65 @@ append_quote_token (m4 *context, m4_obstack *obs, 
m4_symbol_value *value)
   chain->next = NULL;
 }
 
+/* When an ARGV token is seen, convert VALUE to point to it via a
+   composite chain.  Use OBS for any additional allocations
+   needed.  */
+static void
+init_argv_symbol (m4 *context, m4_obstack *obs, m4_symbol_value *value)
+{
+  m4__symbol_chain *src_chain;
+  m4__symbol_chain *chain;
+  int ch = next_char (context, true, true);
+  const m4_string_pair *comments = m4_get_syntax_comments (M4SYNTAX);
+
+  assert (ch == CHAR_QUOTE && value->type == M4_SYMBOL_VOID
+         && isp->funcs == &composite_funcs
+         && isp->u.u_c.chain->type == M4__CHAIN_ARGV
+         && obs && obstack_object_size (obs) == 0);
+
+  src_chain = isp->u.u_c.chain;
+  isp->u.u_c.chain = src_chain->next;
+  value->type = M4_SYMBOL_COMP;
+  /* Clone the link, since the input will be discarded soon.  */
+  chain = (m4__symbol_chain *) obstack_copy (obs, src_chain, sizeof *chain);
+  value->u.u_c.chain = value->u.u_c.end = chain;
+  chain->next = NULL;
+
+  /* If the next character is not ',' or ')', then unlink the last
+     argument from argv and schedule it for reparsing.  This way,
+     expand_argument never has to deal with concatenation of argv with
+     arbitrary text.  Note that the implementation of safe_quotes
+     ensures peek_input won't return CHAR_ARGV if the user is perverse
+     enough to mix comment delimiters with argument separators:
+
+       define(n,`$#')define(echo,$*)changecom(`,,',`)')n(echo(a,`,b`)'',c))
+       => 2 (not 3)
+
+     Therefore, we do not have to worry about calling MATCH, and thus
+     do not have to worry about pop_input being called and
+     invalidating the argv reference.
+
+     When the $@ ref is used unchanged, we completely bypass the
+     decrement of the argv refcount in next_char, since the ref is
+     still live via the current collect_arguments.  However, when the
+     last element of the $@ ref is reparsed, we must increase the argv
+     refcount here, to compensate for the fact that it will be
+     decreased once the final element is parsed.  */
+  assert (!comments->len1
+         || (!m4_has_syntax (M4SYNTAX, *comments->str1,
+                             M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE)
+             && *comments->str1 != *src_chain->u.u_a.quotes->str1));
+  ch = peek_char (context, false);
+  if (!m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE))
+    {
+      isp->u.u_c.chain = src_chain;
+      src_chain->u.u_a.index = m4_arg_argc (chain->u.u_a.argv) - 1;
+      src_chain->u.u_a.comma = true;
+      chain->u.u_a.skip_last = true;
+      m4__arg_adjust_refcount (context, chain->u.u_a.argv, true);
+    }
+}
+
 
 /* Low level input is done a character at a time.  The function
    next_char () is used to read and advance the input to the next
@@ -1146,9 +1223,10 @@ next_char (m4 *context, bool allow_quote, bool retry)
 
 /* The function peek_char () is used to look at the next character in
    the input stream.  At any given time, it reads from the input_block
-   on the top of the current input stack.  */
+   on the top of the current input stack.  If ALLOW_ARGV, then return
+   CHAR_ARGV if an entire $@ reference is available for use.  */
 static int
-peek_char (m4 *context)
+peek_char (m4 *context, bool allow_argv)
 {
   int ch;
   m4_input_block *block = isp;
@@ -1159,7 +1237,8 @@ peek_char (m4 *context)
        return CHAR_EOF;
 
       assert (block->funcs->peek_func);
-      if ((ch = block->funcs->peek_func (block, context)) != CHAR_RETRY)
+      ch = block->funcs->peek_func (block, context, allow_argv);
+      if (ch != CHAR_RETRY)
        {
 /*       if (IS_IGNORE (ch)) */
 /*         return next_char (context, false, true); */
@@ -1228,7 +1307,7 @@ match_input (m4 *context, const char *s, bool consume)
   m4_obstack *st;
   bool result = false;
 
-  ch = peek_char (context);
+  ch = peek_char (context, false);
   if (ch != to_uchar (*s))
     return false;                      /* fail */
 
@@ -1240,7 +1319,7 @@ match_input (m4 *context, const char *s, bool consume)
     }
 
   next_char (context, false, true);
-  for (n = 1, t = s++; (ch = peek_char (context)) == to_uchar (*s++); )
+  for (n = 1, t = s++; (ch = peek_char (context, false)) == to_uchar (*s++); )
     {
       next_char (context, false, true);
       n++;
@@ -1297,7 +1376,7 @@ consume_syntax (m4 *context, m4_obstack *obs, unsigned 
int syntax)
        }
       if (ch == CHAR_RETRY || ch == CHAR_QUOTE)
        {
-         ch = peek_char (context);
+         ch = peek_char (context, false);
          if (m4_has_syntax (M4SYNTAX, ch, syntax))
            {
              assert (ch < CHAR_EOF);
@@ -1355,8 +1434,10 @@ m4_input_exit (void)
    with a description of what TOKEN will contain.  If LINE is not
    NULL, set *LINE to the line number where the token starts.  If OBS,
    expand safe tokens (strings and comments) directly into OBS rather
-   than in a temporary staging area.  Report errors (unterminated
-   comments or strings) on behalf of CALLER, if non-NULL.
+   than in a temporary staging area.  If ALLOW_ARGV, OBS must be
+   non-NULL, and an entire series of arguments can be returned if a $@
+   reference is encountered.  Report errors (unterminated comments or
+   strings) on behalf of CALLER, if non-NULL.
 
    If OBS is NULL or the token expansion is unknown, the token text is
    collected on the obstack token_stack, which never contains more
@@ -1365,7 +1446,7 @@ m4_input_exit (void)
    m4__next_token () is called.  */
 m4__token_type
 m4__next_token (m4 *context, m4_symbol_value *token, int *line,
-               m4_obstack *obs, const char *caller)
+               m4_obstack *obs, bool allow_argv, const char *caller)
 {
   int ch;
   int quote_level;
@@ -1388,7 +1469,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
 
     /* Must consume an input character, but not until CHAR_BUILTIN is
        handled.  */
-    ch = peek_char (context);
+    ch = peek_char (context, allow_argv && m4__quote_age (M4SYNTAX));
     if (ch == CHAR_EOF)                        /* EOF */
       {
 #ifdef DEBUG_INPUT
@@ -1407,6 +1488,14 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
 #endif
        return M4_TOKEN_MACDEF;
       }
+    if (ch == CHAR_ARGV)
+      {
+       init_argv_symbol (context, obs, token);
+#ifdef DEBUG_INPUT
+       m4_print_token ("next_token", M4_TOKEN_ARGV, token);
+#endif
+       return M4_TOKEN_ARGV;
+      }
 
     /* Consume character we already peeked at.  */
     next_char (context, false, true);
@@ -1644,7 +1733,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
 bool
 m4__next_token_is_open (m4 *context)
 {
-  int ch = peek_char (context);
+  int ch = peek_char (context, false);
 
   if (ch == CHAR_EOF || ch == CHAR_BUILTIN
       || m4_has_syntax (M4SYNTAX, ch, (M4_SYNTAX_BCOMM | M4_SYNTAX_ESCAPE
@@ -1667,55 +1756,61 @@ m4_print_token (const char *s, m4__token_type type, 
m4_symbol_value *token)
   m4_obstack obs;
   size_t len;
 
-  obstack_init (&obs);
   if (!s)
     s = "m4input";
-  obstack_grow (&obs, s, strlen (s));
-  obstack_1grow (&obs, ':');
-  obstack_1grow (&obs, ' ');
+  xfprintf (stderr, "%s: ", s);
   switch (type)
     {                          /* TOKSW */
     case M4_TOKEN_EOF:
-      obstack_grow (&obs, "eof", strlen ("eof"));
+      fputs ("eof", stderr);
       token = NULL;
       break;
     case M4_TOKEN_NONE:
-      obstack_grow (&obs, "none", strlen ("none"));
+      fputs ("none", stderr);
       token = NULL;
       break;
     case M4_TOKEN_STRING:
-      obstack_grow (&obs, "string\t", strlen ("string\t"));
+      fputs ("string\t", stderr);
       break;
     case M4_TOKEN_SPACE:
-      obstack_grow (&obs, "space\t", strlen ("space\t"));
+      fputs ("space\t", stderr);
       break;
     case M4_TOKEN_WORD:
-      obstack_grow (&obs, "word\t", strlen ("word\t"));
+      fputs ("word\t", stderr);
       break;
     case M4_TOKEN_OPEN:
-      obstack_grow (&obs, "open\t", strlen ("open\t"));
+      fputs ("open\t", stderr);
       break;
     case M4_TOKEN_COMMA:
-      obstack_grow (&obs, "comma\t", strlen ("comma\t"));
+      fputs ("comma\t", stderr);
       break;
     case M4_TOKEN_CLOSE:
-      obstack_grow (&obs, "close\t", strlen ("close\t"));
+      fputs ("close\t", stderr);
       break;
     case M4_TOKEN_SIMPLE:
-      obstack_grow (&obs, "simple\t", strlen ("simple\t"));
+      fputs ("simple\t", stderr);
       break;
     case M4_TOKEN_MACDEF:
-      obstack_grow (&obs, "builtin\t", strlen ("builtin\t"));
+      fputs ("builtin\t", stderr);
+      break;
+    case M4_TOKEN_ARGV:
+      fputs ("argv\t", stderr);
       break;
     default:
       abort ();
     }
   if (token)
-    m4_symbol_value_print (token, &obs, true, "\"", "\"", SIZE_MAX, NULL);
-  obstack_1grow (&obs, '\n');
-  len = obstack_object_size (&obs);
-  fwrite (obstack_finish (&obs), 1, len, stderr);
-  obstack_free (&obs, NULL);
+    {
+      obstack_init (&obs);
+      m4_symbol_value_print (token, &obs, NULL, NULL, true);
+      len = obstack_object_size (&obs);
+      xfprintf (stderr, "%s\n", quotearg_style_mem (c_maybe_quoting_style,
+                                                   obstack_finish (&obs),
+                                                   len));
+      obstack_free (&obs, NULL);
+    }
+  else
+    fputc ('\n', stderr);
   return 0;
 }
 #endif /* DEBUG_INPUT */
diff --git a/m4/m4private.h b/m4/m4private.h
index 77590f3..a2b78b8 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -221,6 +221,7 @@ struct m4__symbol_chain
       size_t index;                    /* Argument index within argv.  */
       bool_bitfield flatten : 1;       /* True to treat builtins as text.  */
       bool_bitfield comma : 1;         /* True when `,' is next input.  */
+      bool_bitfield skip_last : 1;     /* True if last argument omitted.  */
       const m4_string_pair *quotes;    /* NULL for $*, quotes for 
address@hidden  */
     } u_a;                     /* M4__CHAIN_ARGV.  */
   } u;
@@ -397,7 +398,8 @@ extern void m4__symtab_remove_module_references 
(m4_symbol_table*,
 #define CHAR_EOF       256     /* Character return on EOF.  */
 #define CHAR_BUILTIN   257     /* Character return for BUILTIN token.  */
 #define CHAR_QUOTE     258     /* Character return for quoted string.  */
-#define CHAR_RETRY     259     /* Character return for end of input block.  */
+#define CHAR_ARGV      259     /* Character return for $@ reference.  */
+#define CHAR_RETRY     260     /* Character return for end of input block.  */
 
 #define DEF_LQUOTE     "`"     /* Default left quote delimiter.  */
 #define DEF_RQUOTE     "\'"    /* Default right quote delimiter.  */
@@ -475,7 +477,8 @@ typedef enum {
   M4_TOKEN_COMMA,      /* Argument separator, M4_SYMBOL_TEXT.  */
   M4_TOKEN_CLOSE,      /* Argument list end, M4_SYMBOL_TEXT.  */
   M4_TOKEN_SIMPLE,     /* Single character, M4_SYMBOL_TEXT.  */
-  M4_TOKEN_MACDEF      /* Macro's definition (see "defn"), M4_SYMBOL_FUNC.  */
+  M4_TOKEN_MACDEF,     /* Macro's definition (see "defn"), M4_SYMBOL_FUNC.  */
+  M4_TOKEN_ARGV                /* A series of parameters, M4_SYMBOL_COMP.  */
 } m4__token_type;
 
 extern void            m4__make_text_link (m4_obstack *, m4__symbol_chain **,
@@ -483,7 +486,7 @@ extern      void            m4__make_text_link (m4_obstack 
*, m4__symbol_chain **,
 extern bool            m4__push_symbol (m4 *, m4_symbol_value *, size_t,
                                         bool);
 extern m4__token_type  m4__next_token (m4 *, m4_symbol_value *, int *,
-                                       m4_obstack *, const char *);
+                                       m4_obstack *, bool, const char *);
 extern bool            m4__next_token_is_open (m4 *);
 
 /* Fast macro versions of macro argv accessor functions,
diff --git a/m4/macro.c b/m4/macro.c
index 708be58..d6f81d8 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -183,7 +183,7 @@ m4_macro_expand_input (m4 *context)
   m4_set_symbol_value_text (&empty_symbol, "", 0, 0);
   VALUE_MAX_ARGS (&empty_symbol) = -1;
 
-  while ((type = m4__next_token (context, &token, &line, NULL, NULL))
+  while ((type = m4__next_token (context, &token, &line, NULL, false, NULL))
         != M4_TOKEN_EOF)
     expand_token (context, NULL, type, &token, line, true);
 }
@@ -311,7 +311,7 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
   /* Skip leading white space.  */
   do
     {
-      type = m4__next_token (context, &token, NULL, obs, caller);
+      type = m4__next_token (context, &token, NULL, obs, true, caller);
     }
   while (type == M4_TOKEN_SPACE);
 
@@ -389,6 +389,20 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
            argp->type = M4_SYMBOL_TEXT;
          break;
 
+       case M4_TOKEN_ARGV:
+         assert (paren_level == 0 && argp->type == M4_SYMBOL_VOID
+                 && obstack_object_size (obs) == 0
+                 && token.u.u_c.chain == token.u.u_c.end
+                 && token.u.u_c.chain->type == M4__CHAIN_ARGV);
+         argp->type = M4_SYMBOL_COMP;
+         argp->u.u_c.chain = argp->u.u_c.end = token.u.u_c.chain;
+         type = m4__next_token (context, &token, NULL, NULL, false, caller);
+         if (argp->u.u_c.chain->u.u_a.skip_last)
+           assert (type == M4_TOKEN_COMMA);
+         else
+           assert (type == M4_TOKEN_COMMA || type == M4_TOKEN_CLOSE);
+         return type == M4_TOKEN_COMMA;
+
        default:
          assert (!"expand_argument");
          abort ();
@@ -396,7 +410,7 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
 
       if (argp->type != M4_SYMBOL_VOID || obstack_object_size (obs))
        first = false;
-      type = m4__next_token (context, &token, NULL, obs, caller);
+      type = m4__next_token (context, &token, NULL, obs, first, caller);
     }
 }
 
@@ -583,7 +597,7 @@ collect_arguments (m4 *context, const char *name, size_t 
len,
   if (m4__next_token_is_open (context))
     {
       /* Gobble parenthesis, then collect arguments.  */
-      m4__next_token (context, &token, NULL, NULL, name);
+      m4__next_token (context, &token, NULL, NULL, false, name);
       do
        {
          tokenp = (m4_symbol_value *) obstack_alloc (arguments,
@@ -608,12 +622,22 @@ collect_arguments (m4 *context, const char *name, size_t 
len,
              && m4_get_symbol_value_quote_age (tokenp) != args.quote_age)
            args.quote_age = 0;
          else if (tokenp->type == M4_SYMBOL_COMP)
-           args.has_ref = true;
+           {
+             args.has_ref = true;
+             if (tokenp->u.u_c.chain->type == M4__CHAIN_ARGV)
+               {
+                 args.argc += (tokenp->u.u_c.chain->u.u_a.argv->argc
+                               - tokenp->u.u_c.chain->u.u_a.index
+                               - tokenp->u.u_c.chain->u.u_a.skip_last - 1);
+                 args.wrapper = true;
+               }
+           }
        }
       while (more_args);
     }
   argv = (m4_macro_args *) obstack_finish (argv_stack);
   argv->argc = args.argc;
+  argv->wrapper = args.wrapper;
   argv->has_ref = args.has_ref;
   if (args.quote_age != m4__quote_age (M4SYNTAX))
     argv->quote_age = 0;
@@ -981,9 +1005,22 @@ m4__arg_adjust_refcount (m4 *context, m4_macro_args 
*argv, bool increase)
          chain = argv->array[i]->u.u_c.chain;
          while (chain)
            {
-             assert (chain->type == M4__CHAIN_STR);
-             if (chain->u.u_s.level < SIZE_MAX)
-               m4__adjust_refcount (context, chain->u.u_s.level, increase);
+             switch (chain->type)
+               {
+               case M4__CHAIN_STR:
+                 if (chain->u.u_s.level < SIZE_MAX)
+                   m4__adjust_refcount (context, chain->u.u_s.level,
+                                        increase);
+                 break;
+               case M4__CHAIN_ARGV:
+                 assert (chain->u.u_a.argv->inuse);
+                 m4__arg_adjust_refcount (context, chain->u.u_a.argv,
+                                          increase);
+                 break;
+               default:
+                 assert (!"m4__arg_adjust_refcount");
+                 abort ();
+               }
              chain = chain->next;
            }
        }
@@ -996,15 +1033,25 @@ m4__arg_adjust_refcount (m4 *context, m4_macro_args 
*argv, bool increase)
 static void
 arg_mark (m4_macro_args *argv)
 {
+  size_t i;
+  m4__symbol_chain *chain;
+
+  if (argv->inuse)
+    return;
   argv->inuse = true;
   if (argv->wrapper)
     {
-      /* TODO for now we support only a single-length $@ chain.  */
-      assert (argv->arraylen == 1
-             && argv->array[0]->type == M4_SYMBOL_COMP
-             && !argv->array[0]->u.u_c.chain->next
-             && argv->array[0]->u.u_c.chain->type == M4__CHAIN_ARGV);
-      argv->array[0]->u.u_c.chain->u.u_a.argv->inuse = true;
+      for (i = 0; i < argv->arraylen; i++)
+       if (argv->array[i]->type == M4_SYMBOL_COMP)
+         {
+           chain = argv->array[i]->u.u_c.chain;
+           while (chain)
+             {
+               if (chain->type == M4__CHAIN_ARGV && !chain->u.u_a.argv->inuse)
+                 arg_mark (chain->u.u_a.argv);
+               chain = chain->next;
+             }
+         }
     }
 }
 
@@ -1022,12 +1069,13 @@ make_argv_ref (m4_symbol_value *value, m4_obstack *obs, 
size_t level,
   m4__symbol_chain *chain;
 
   assert (obstack_object_size (obs) == 0);
-  if (argv->wrapper)
+  if (argv->wrapper && argv->arraylen == 1)
     {
-      /* TODO support concatenation with $@ refs.  */
-      assert (argv->arraylen == 1 && argv->array[0]->type == M4_SYMBOL_COMP);
+      /* TODO support $@ ref alongside other arguments.  */
+      assert (argv->array[0]->type == M4_SYMBOL_COMP);
       chain= argv->array[0]->u.u_c.chain;
-      assert (!chain->next && chain->type == M4__CHAIN_ARGV);
+      assert (!chain->next && chain->type == M4__CHAIN_ARGV
+             && !chain->u.u_a.skip_last);
       argv = chain->u.u_a.argv;
       index += chain->u.u_a.index - 1;
     }
@@ -1044,6 +1092,7 @@ make_argv_ref (m4_symbol_value *value, m4_obstack *obs, 
size_t level,
   chain->u.u_a.index = index;
   chain->u.u_a.flatten = flatten;
   chain->u.u_a.comma = false;
+  chain->u.u_a.skip_last = false;
   if (quotes)
     {
       /* Clone the quotes into the obstack, since changequote can
@@ -1081,12 +1130,14 @@ arg_symbol (m4_macro_args *argv, size_t index, size_t 
*level)
   for (i = 0; i < argv->arraylen; i++)
     {
       value = argv->array[i];
-      if (value->type == M4_SYMBOL_COMP)
+      if (value->type == M4_SYMBOL_COMP
+         && value->u.u_c.chain->type == M4__CHAIN_ARGV)
        {
          m4__symbol_chain *chain = value->u.u_c.chain;
          /* TODO - for now we support only a single $@ chain.  */
-         assert (!chain->next && chain->type == M4__CHAIN_ARGV);
-         if (index < chain->u.u_a.argv->argc - (chain->u.u_a.index - 1))
+         assert (!chain->next);
+         if (index <= (chain->u.u_a.argv->argc - chain->u.u_a.index
+                       - chain->u.u_a.skip_last))
            {
              value = arg_symbol (chain->u.u_a.argv,
                                  chain->u.u_a.index - 1 + index, level);
@@ -1094,7 +1145,8 @@ arg_symbol (m4_macro_args *argv, size_t index, size_t 
*level)
                value = &empty_symbol;
              break;
            }
-         index -= chain->u.u_a.argv->argc - chain->u.u_a.index;
+         index -= (chain->u.u_a.argv->argc - chain->u.u_a.index
+                   - chain->u.u_a.skip_last);
        }
       else if (--index == 0)
        break;
@@ -1154,15 +1206,25 @@ m4_arg_text (m4 *context, m4_macro_args *argv, size_t 
index)
   value = m4_arg_symbol (argv, index);
   if (m4_is_symbol_value_text (value))
     return m4_get_symbol_value_text (value);
-  /* TODO - concatenate argv refs and functions?  For now, we assume
-     all chain elements are text.  */
+  /* TODO - concatenate functions.  */
   assert (value->type == M4_SYMBOL_COMP);
   chain = value->u.u_c.chain;
   obs = m4_arg_scratch (context);
   while (chain)
     {
-      assert (chain->type == M4__CHAIN_STR);
-      obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
+      switch (chain->type)
+       {
+       case M4__CHAIN_STR:
+         obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
+         break;
+       case M4__CHAIN_ARGV:
+         m4_arg_print (obs, chain->u.u_a.argv, chain->u.u_a.index,
+                       chain->u.u_a.quotes, NULL, false);
+         break;
+       default:
+         assert (!"m4_arg_text");
+         abort ();
+       }
       chain = chain->next;
     }
   obstack_1grow (obs, '\0');
diff --git a/m4/symtab.c b/m4/symtab.c
index f302fe8..9636f9d 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -845,7 +845,7 @@ dump_symbol_CB (m4_symbol_table *symtab, const char *name,
     {
       m4_obstack obs;
       obstack_init (&obs);
-      m4_symbol_value_print (value, &obs, false, NULL, NULL, SIZE_MAX, true);
+      m4_symbol_value_print (value, &obs, NULL, NULL, true);
       xfprintf (stderr, "%s", (char *) obstack_finish (&obs));
       obstack_free (&obs, NULL);
     }
diff --git a/m4/syntax.c b/m4/syntax.c
index aff6444..8a7b0d1 100644
--- a/m4/syntax.c
+++ b/m4/syntax.c
@@ -743,9 +743,11 @@ set_quote_age (m4_syntax_table *syntax, bool reset, bool 
change)
                          | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE
                          | M4_SYNTAX_SPACE))
       && *syntax->quote.str1 != *syntax->quote.str2
-      && *syntax->comm.str1 != *syntax->quote.str2
-      && !m4_has_syntax (syntax, *syntax->comm.str1,
-                        M4_SYNTAX_OPEN | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE)
+      && (!syntax->comm.len1
+          || (*syntax->comm.str1 != *syntax->quote.str2
+              && !m4_has_syntax (syntax, *syntax->comm.str1,
+                                 (M4_SYNTAX_OPEN | M4_SYNTAX_COMMA
+                                  | M4_SYNTAX_CLOSE))))
       && m4_has_syntax (syntax, ',', M4_SYNTAX_COMMA))
     {
       syntax->quote_age = ((local_syntax_age << 16)
-- 
1.5.4

From 7319157ccd7cd65f72c0a456c3091252a13f558a Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Thu, 1 Nov 2007 09:28:46 -0600
Subject: [PATCH] Stage 15: return argv refs back to collect_arguments.

* src/m4.h (enum token_type): Add TOKEN_ARGV.
(struct token_chain): Add skip_last member to argv link.
(next_token): Update prototype.
* src/input.c (CHAR_ARGV): New placeholder input character.
(peek_input): Add parameter, to pass $@ at once.
(next_char_1, append_quote_token): Handle $@ inside quotes.
(init_argv_token): New function.
(push_token, match_input, next_token, peek_token, lex_debug):
Update callers.
* src/macro.c (expand_input, collect_arguments): Likewise.
(expand_argument): Handle incoming $@ token.
(arg_adjust_refcount, arg_token, arg_text, make_argv_ref_token):
Handle nested $@ refs.
* src/symtab.c (symtab_debug): Update caller.
* examples/null.m4: Document more tests that are needed.  Add
tests for NUL with divert, patsubst, and regexp.
* examples/null.out: Update for new tests.
* doc/m4.texinfo (Syntax): Add test for m4exit and NUL.
* checks/get-them (AWK): Give a default value.
* checks/check-them: Allow tests to invoke child processes with
same include path.  Perform message normalization on stderr.

(cherry picked from commit 1fecefc8b990254aa667a01d12c6c7a2d716df06)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog         |   31 +++++++++
 checks/check-them |   10 ++-
 checks/get-them   |    4 +-
 doc/m4.texinfo    |    8 ++-
 examples/null.m4  |   93 ++++++++++++++++++++--------
 examples/null.out |    5 +-
 src/input.c       |  177 +++++++++++++++++++++++++++++++++++++++++------------
 src/m4.h          |    7 ++-
 src/macro.c       |  112 +++++++++++++++++++++++++--------
 src/symtab.c      |    2 +-
 10 files changed, 345 insertions(+), 104 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c455abc..0baa3c1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,34 @@
+2008-02-16  Eric Blake  <address@hidden>
+
+       Stage 15: return argv refs back to collect_arguments.
+       Collect an entire $@ reference at once rather than one argument at
+       a time, outside of quotes (but inside quotes, $@ is still
+       flattened for now).  The skip_last field allows concatenation of
+       $@ with other text when collecting arguments.
+       Memory impact: noticeable improvement, due to better reuse of 
address@hidden
+       Speed impact: noticeable improvement, due to less parsing.
+       * src/m4.h (enum token_type): Add TOKEN_ARGV.
+       (struct token_chain): Add skip_last member to argv link.
+       (next_token): Update prototype.
+       * src/input.c (CHAR_ARGV): New placeholder input character.
+       (peek_input): Add parameter, to pass $@ at once.
+       (next_char_1, append_quote_token): Handle $@ inside quotes.
+       (init_argv_token): New function.
+       (push_token, match_input, next_token, peek_token, lex_debug):
+       Update callers.
+       * src/macro.c (expand_input, collect_arguments): Likewise.
+       (expand_argument): Handle incoming $@ token.
+       (arg_adjust_refcount, arg_token, arg_text, make_argv_ref_token):
+       Handle nested $@ refs.
+       * src/symtab.c (symtab_debug): Update caller.
+       * examples/null.m4: Document more tests that are needed.  Add
+       tests for NUL with divert, patsubst, and regexp.
+       * examples/null.out: Update for new tests.
+       * doc/m4.texinfo (Syntax): Add test for m4exit and NUL.
+       * checks/get-them (AWK): Give a default value.
+       * checks/check-them: Allow tests to invoke child processes with
+       same include path.  Perform message normalization on stderr.
+
 2008-02-15  Eric Blake  <address@hidden>
 
        Use fastmaps for better regex performance.
diff --git a/checks/check-them b/checks/check-them
index daa1b00..9fca39b 100755
--- a/checks/check-them
+++ b/checks/check-them
@@ -1,6 +1,6 @@
 #!/bin/sh
 # Check GNU m4 against examples from the manual source.
-# Copyright (C) 1992, 2006, 2007 Free Software Foundation, Inc.
+# Copyright (C) 1992, 2006, 2007, 2008 Free Software Foundation, Inc.
 
 # Sanity check what we are testing
 m4 --version
@@ -68,7 +68,7 @@ do
   echo "Checking $file"
   options=`sed -ne '3s/^dnl @ extra options: //p;3q' "$file"`
   sed -e '/^dnl @/d' -e '/^\^D$/q' "$file" \
-    | LC_MESSAGES=C m4 -d -I "$examples" $options - >$out 2>$err
+    | LC_MESSAGES=C M4PATH=$examples m4 -d $options - >$out 2>$err
   stat=$?
 
   xstat=`sed -ne '2s/^dnl @ expected status: //p;2q' "$file"`
@@ -96,9 +96,11 @@ do
 
   xerrfile=`sed -n 's/^dnl @ expected error: //p' "$file"`
   if test -z "$xerrfile" ; then
-    sed -e '/^dnl @error{}/!d' -e 's///' -e "s|^m4:|$m4:|" "$file" > $xerr
+    sed '/^dnl @error{}/!d; s///; '"s|^m4:|$m4:|; s|\.\./examples|$examples|" \
+      "$file" > $xerr
   else
-    cp "$examples/$xerrfile" $xerr
+    sed "s|^m4:|$m4:|; s|\.\./examples|$examples|" \
+      "$examples/$xerrfile" > $xerr
   fi
 
   # For the benefit of mingw, normalize \r\n line endings
diff --git a/checks/get-them b/checks/get-them
index e034962..803f413 100755
--- a/checks/get-them
+++ b/checks/get-them
@@ -1,11 +1,13 @@
 #!/bin/sh
 # -*- AWK -*-
 # Extract all examples from the manual source.
-# Copyright (C) 1992, 2005, 2006, 2007 Free Software Foundation, Inc.
+# Copyright (C) 1992, 2005, 2006, 2007, 2008 Free Software Foundation,
+# Inc.
 
 # This script is for use with GNU awk.
 
 FILE=${1-/dev/null}
+: ${AWK=awk}
 
 $AWK '
 
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 69cfb62..32cb0a9 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -933,7 +933,13 @@ exception of the @sc{nul} character (the zero byte 
@samp{'\0'}).
 @comment xout: null.out
 @comment xerr: null.err
 @example
-include(`null.m4')dnl
+define(`m4exit')include(`null.m4')dnl
address@hidden example
+
address@hidden status: 2
address@hidden
+include(`null.m4')
address@hidden This file tests m4 behavior on NUL bytes.
 @end example
 @end ignore
 
diff --git a/examples/null.m4 b/examples/null.m4
index 904a6ef..2632522 100644
--- a/examples/null.m4
+++ b/examples/null.m4
@@ -1,4 +1,6 @@
-# This file tests m4 behavior on NUL bytes
+# This file tests m4 behavior on NUL bytes.
+dnl Use `m4 -Dm4exit' to test rest of file.  NUL not a number, needs to warn
+m4exit(`22')dnl
 dnl Raw pass-through:
 raw: --
 dnl Embedded in quotes:
@@ -7,82 +9,119 @@ dnl Embedded in comments:
 commented: #--
 dnl Passed through $1, $*, $@:
 define(`echo', address@hidden')define(`', `empty')dnl
+define(`-', `dash')define(`--', `dashes')dnl
 user: echo(--,`11')
-dnl All macros matching __*__ take no arguments, and never produce NUL
+dnl All macros matching __*__ take no arguments, and never produce NUL.
 dnl First argument of builtin: not tested yet. No builtin includes NUL, so
 dnl   this needs to warn, but warning output needs quoting.
 dnl Remaining arguments of builtin:
 `builtin:' builtin(`len', --)
-dnl changecom: not tested yet
-dnl changequote: not tested yet
-dnl changeword: not tested yet
-dnl debugfile: not tested yet. No file name includes NUL, needs to warn
-dnl debugmode: not tested yet. NUL not a valid mode, needs to warn
-dnl decr: not tested yet. NUL not a number, needs to warn
+dnl Single-byte delimiter in changecom: not tested yet
+dnl Multi-byte delimiter in changecom: not tested yet
+dnl Single-byte delimiter in changequote: not tested yet
+dnl Multi-byte delimiter in changequote: not tested yet
+dnl Quotes in trace and dump output: not tested yet
+dnl Used in changeword (if changeword available): not tested yet
+dnl Bad regex in changeword: not tested yet
+dnl Warning from debugfile: not tested yet. No file name includes NUL, needs 
to warn
+dnl Warning from debugmode: not tested yet. NUL not a valid mode, needs to warn
+dnl Warning from decr: not tested yet. NUL not a number, needs to warn
 dnl Macro name of define:
 define(`--', `odd name: $1')dnl
 dnl Definition of define: not tested yet
-dnl Macro name in defn:
+dnl Undefined argument of defn: not tested yet. Should it warn?
+dnl Defined macro name in defn:
 `defn:' defn(`--')
 dnl Macro contents in defn: not tested yet
-dnl divert: not tested yet. NUL not a number, needs to warn
-dnl divnum: Takes no arguments, and never produces NUL.
+dnl Argument to divert: not tested yet. NUL not a number, needs to warn
+dnl Passed through diversion by divert:
+divert(`1')`divert:' --
+divert`'undivert(`1')dnl
+dnl Divnum takes no arguments, and never produces NUL.
 dnl Discarded by dnl: --
-dnl dumpdef: not tested yet. Needs to quote properly.
+dnl Undefined argument of dumpdef: not tested yet. Needs to quote properly.
+dnl Defined macro names in dumpdef: not tested yet
+dnl Macro contents in dumpdef: not tested yet
 dnl Passed through errprint:
 errprint(`errprint:' --, `--
 ')dnl
 dnl Passed to esyscmd: not tested yet. NUL truncates string, needs to warn
 dnl Generated from esyscmd:
 `esyscmd:' esyscmd(`printf "[\\0]"')
-dnl eval: not tested yet. NUL not a number, needs to warn
-dnl format: not tested yet. Should %s string truncation warn?
+dnl First argument of eval: not tested yet. NUL not a number, needs to warn
+dnl Other arguments of eval: not tested yet, needs to warn
+dnl First argument to format: not tested yet
+dnl Invalid specifier in format: not tested yet, needs to warn
+dnl Numeric and string arguments to format: not tested yet, needs to warn
+dnl Character argument to format: not tested yet, %c semantics needed
 dnl Macro name in ifdef, passed through ifdef:
 `ifdef:' ifdef(`--', `yes: --', `oops: --')dnl
  ifdef(, `oops: --', `no: --')
 dnl Compared in ifelse, passed through ifelse:
 `ifelse:' ifelse(`-', `--', `oops', `--', --, `yes: --')
-dnl include: not tested yet. No file name includes NUL, needs to warn
-dnl incr: not tested yet. NUL not a number, needs to warn
+dnl Warning from include: not tested yet. No file name includes NUL, needs to 
warn
+dnl Warning from incr: not tested yet. NUL not a number, needs to warn
 dnl Passed through index:
 `index:' index(`ab', `b') index(`-', `') index(`', `-')dnl
  index(`-', `-')
 dnl Defined first argument of indir:
 `indir:' indir(`--', 11)dnl
 dnl Undefined first argument of indir: not tested yet. Needs to warn
+dnl Warning issued via indir: not tested yet
 dnl Other arguments of indir:
  indir(`len', `--')
 dnl Passed through len:
 `len:' len() len(--)
-dnl m4exit: not tested yet. NUL not a number, needs to warn.
+dnl Test m4exit separately from m4wrap; see above.
 dnl Passed through m4wrap: not working yet
 m4wrap(``m4wrap:' -
 -
 ')dnl
-dnl maketemp: not tested yet. No file name includes NUL, needs to warn
-dnl mkstemp: not tested yet. No file name includes NUL, needs to warn
-dnl patsubst: not tested yet
+dnl Warning from maketemp: not tested yet. No file name includes NUL, needs to 
warn
+dnl Warning from mkstemp: not tested yet. No file name includes NUL, needs to 
warn
+dnl Bad regex in patsubst: not tested yet
+dnl First argument of patsubst:
+`patsubst:' patsubst(`--', `-', `.')dnl
+dnl Matching via meta-character in patsubst:
+ patsubst(`--', `[^-]')dnl
+dnl Second argument of patsubst:
+ patsubst(`abc', `b', `-') patsubst(`--', `', `!')dnl
+dnl Third argument of patsubst: not tested yet
+dnl Replacement via reference in patsubst:
+ patsubst(`----', `-\(.\)-', `\1-\1')
 dnl Defined argument of popdef:
 `popdef:' popdef(`--')ifdef(`--', `oops', `ok')
 dnl Undefined argument of popdef: not tested yet. Should it warn?
 dnl Macro name of pushdef:
 `pushdef:' pushdef(`--', `strange: $1')ifdef(`--', `ok', `oops')
 dnl Definition of pushdef: not tested yet
-dnl regexp: not tested yet
+dnl Bad regex in regexp: not tested yet
+dnl First argument of regexp:
+`regexp:' regexp(`ab', `b')dnl
+dnl Matching via meta-character in regexp:
+ regexp(`--', `[^-]', `!')dnl
+dnl Second argument of regexp:
+ regexp(`--', `')dnl
+dnl Third argument of regexp: not tested yet
+dnl Replacement via reference in regexp:
+ regexp(`----', `-\(.\)-', `\1-\1')
 dnl Passed through shift:
 `shift:' shift(`hi', `--', --)
-dnl sinclude: not tested yet. No file name includes NUL, needs to warn
+dnl Warning from sinclude: not tested yet. No file name includes NUL, needs to 
warn
 dnl First argument of substr:
 `substr:' substr(`----', `1', `3')
 dnl Other arguments of substr: not tested yet. NUL not a number, needs to warn.
-dnl syscmd: not tested yet. NUL truncates string, needs to warn
-dnl sysval: Takes no arguments, and never produces NUL.
+dnl Passed to syscmd: not tested yet. NUL truncates string, needs to warn
+dnl Sysval takes no arguments, and never produces NUL.
 dnl Passed to traceoff:
 traceoff(`--', `')dnl
-dnl traceon: not tested yet. Trace output needs quoting
+dnl Macro name and arguments of traceon: not tested yet. Trace output needs 
quoting
+dnl Defined text of traceon: not tested yet. Needs tracing indirect macros
 `traceon:' indir(`--', `--')
-dnl translit: not tested yet
+dnl First argument of translit: not tested yet
+dnl Single character in other arguments of translit: not tested yet
+dnl Character ranges of translit: not tested yet
 dnl Defined argument of undefine:
 `undefine:' undefine(`--')ifdef(`--', `oops', `ok')
 dnl Undefined argument of undefine: not tested yet. Should it warn?
-dnl undivert: not tested yet. No file name or number includes NUL, needs to 
warn
+dnl Warning from undivert: not tested yet. No file name or number includes 
NUL, needs to warn
diff --git a/examples/null.out b/examples/null.out
index 6e8a114..c42e03c 100644
--- a/examples/null.out
+++ b/examples/null.out
@@ -1,18 +1,21 @@
-# This file tests m4 behavior on NUL bytes
+# This file tests m4 behavior on NUL bytes.
 raw: --
 quoted: --
 commented: #--
 user: .--.--,11.--,11.
 builtin: 3
 defn: odd name: $1
+divert: --
 esyscmd: []
 ifdef: yes: -- oops: --
 ifelse: yes: --
 index: 2 -1 -1 8
 indir: odd name: 11 3
 len: 1 3
+patsubst: .. -- abc -!- ---
 popdef: ok
 pushdef: ok
+regexp: 2 ! 0 -
 shift: --,--
 substr: --
 traceon: strange: --
diff --git a/src/input.c b/src/input.c
index a0de36f..e320c72 100644
--- a/src/input.c
+++ b/src/input.c
@@ -154,6 +154,7 @@ static bool input_change;
 #define CHAR_EOF       256     /* Character return on EOF.  */
 #define CHAR_MACRO     257     /* Character return for MACRO token.  */
 #define CHAR_QUOTE     258     /* Character return for quoted string.  */
+#define CHAR_ARGV      259     /* Character return for $@ reference.  */
 
 /* Quote chars.  */
 string_pair curr_quote;
@@ -446,7 +447,7 @@ push_token (token_data *token, int level, bool inuse)
       next->u.u_c.end = chain;
       if (chain->type == CHAIN_ARGV)
        {
-         assert (!chain->u.u_a.comma);
+         assert (!chain->u.u_a.comma && !chain->u.u_a.skip_last);
          inuse |= arg_adjust_refcount (chain->u.u_a.argv, true);
        }
       else if (chain->type == CHAIN_STR && chain->u.u_s.level >= 0)
@@ -712,17 +713,18 @@ input_print (struct obstack *obs, const input_block 
*input)
 }
 
 
-/*-----------------------------------------------------------------.
-| Low level input is done a character at a time.  The function     |
-| peek_input () is used to look at the next character in the input |
-| stream.  At any given time, it reads from the input_block on the |
-| top of the current input stack.  The return value is an unsigned |
-| char, or CHAR_EOF if there is no more input, or CHAR_MACRO if a  |
-| builtin token occurs next.                                       |
-`-----------------------------------------------------------------*/
+/*------------------------------------------------------------------.
+| Low level input is done a character at a time.  The function      |
+| peek_input () is used to look at the next character in the input  |
+| stream.  At any given time, it reads from the input_block on the  |
+| top of the current input stack.  The return value is an unsigned  |
+| char, CHAR_EOF if there is no more input, CHAR_MACRO if a builtin |
+| token occurs next, or CHAR_ARGV if ALLOW_ARGV and the input is    |
+| visiting an argv reference with the correct quoting.              |
+`------------------------------------------------------------------*/
 
 static int
-peek_input (void)
+peek_input (bool allow_argv)
 {
   int ch;
   input_block *block = isp;
@@ -757,6 +759,7 @@ peek_input (void)
          chain = block->u.u_c.chain;
          while (chain)
            {
+             unsigned int argc;
              switch (chain->type)
                {
                case CHAIN_STR:
@@ -764,11 +767,17 @@ peek_input (void)
                    return to_uchar (*chain->u.u_s.str);
                  break;
                case CHAIN_ARGV:
-                 /* TODO - pass multiple arguments to macro.c at once.  */
-                 if (chain->u.u_a.index == arg_argc (chain->u.u_a.argv))
+                 argc = arg_argc (chain->u.u_a.argv);
+                 if (chain->u.u_a.index == argc)
                    break;
                  if (chain->u.u_a.comma)
                    return ',';
+                 /* Only return a reference if the quoting is correct
+                    and the reference has more than one argument
+                    left.  */
+                 if (allow_argv && chain->quote_age == current_quote_age
+                     && chain->u.u_a.quotes && chain->u.u_a.index + 1 < argc)
+                   return CHAR_ARGV;
                  /* Rather than directly parse argv here, we push
                     another input block containing the next unparsed
                     argument from argv.  */
@@ -778,7 +787,7 @@ peek_input (void)
                  chain->u.u_a.index++;
                  chain->u.u_a.comma = true;
                  push_string_finish ();
-                 return peek_input ();
+                 return peek_input (allow_argv);
                default:
                  assert (!"peek_input");
                  abort ();
@@ -871,9 +880,7 @@ next_char_1 (bool allow_quote)
          chain = isp->u.u_c.chain;
          while (chain)
            {
-             /* TODO also support returning $@ as CHAR_QUOTE.  */
-             if (allow_quote && chain->quote_age == current_quote_age
-                 && chain->type == CHAIN_STR)
+             if (allow_quote && chain->quote_age == current_quote_age)
                return CHAR_QUOTE;
              switch (chain->type)
                {
@@ -889,7 +896,6 @@ next_char_1 (bool allow_quote)
                    adjust_refcount (chain->u.u_s.level, false);
                  break;
                case CHAIN_ARGV:
-                 /* TODO - pass multiple arguments to macro.c at once.  */
                  if (chain->u.u_a.index == arg_argc (chain->u.u_a.argv))
                    {
                      arg_adjust_refcount (chain->u.u_a.argv, false);
@@ -956,7 +962,6 @@ skip_line (const char *name)
   if (file != current_file || line != current_line)
     input_change = true;
 }
-
 
 /*-------------------------------------------------------------------.
 | When a MACRO token is seen, next_token () uses init_macro_token () |
@@ -983,20 +988,30 @@ append_quote_token (struct obstack *obs, token_data *td)
   token_chain *src_chain = isp->u.u_c.chain;
   token_chain *chain;
 
-  assert (isp->type == INPUT_CHAIN && obs && current_quote_age
-         && src_chain->type == CHAIN_STR && src_chain->u.u_s.level >= 0);
+  assert (isp->type == INPUT_CHAIN && obs && current_quote_age);
   isp->u.u_c.chain = src_chain->next;
 
   /* Speed consideration - for short enough tokens, the speed and
      memory overhead of parsing another INPUT_CHAIN link outweighs the
      time to inline the token text.  */
-  if (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD)
+  if (src_chain->type == CHAIN_STR
+      && src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD)
     {
+      assert (src_chain->u.u_s.level >= 0);
       obstack_grow (obs, src_chain->u.u_s.str, src_chain->u.u_s.len);
       adjust_refcount (src_chain->u.u_s.level, false);
       return;
     }
 
+  /* TODO preserve $@ through a quoted context.  */
+  if (src_chain->type == CHAIN_ARGV)
+    {
+      arg_print (obs, src_chain->u.u_a.argv, src_chain->u.u_a.index,
+                src_chain->u.u_a.quotes, NULL);
+      arg_adjust_refcount (src_chain->u.u_a.argv, false);
+      return;
+    }
+
   if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
     {
       TOKEN_DATA_TYPE (td) = TOKEN_COMP;
@@ -1013,6 +1028,65 @@ append_quote_token (struct obstack *obs, token_data *td)
   chain->next = NULL;
 }
 
+
+/*-------------------------------------------------------------------.
+| When an ARGV token is seen, convert TD to point to it via a       |
+| composite token.  Use OBS for any additional allocations needed to |
+| store the token chain.                                            |
+`-------------------------------------------------------------------*/
+static void
+init_argv_token (struct obstack *obs, token_data *td)
+{
+  token_chain *src_chain;
+  token_chain *chain;
+  int ch = next_char (true);
+
+  assert (ch == CHAR_QUOTE && TOKEN_DATA_TYPE (td) == TOKEN_VOID
+         && isp->type == INPUT_CHAIN && isp->u.u_c.chain->type == CHAIN_ARGV
+         && obs && obstack_object_size (obs) == 0);
+
+  src_chain = isp->u.u_c.chain;
+  isp->u.u_c.chain = src_chain->next;
+  TOKEN_DATA_TYPE (td) = TOKEN_COMP;
+  /* Clone the link, since the input will be discarded soon.  */
+  chain = (token_chain *) obstack_copy (obs, src_chain, sizeof *chain);
+  td->u.u_c.chain = td->u.u_c.end = chain;
+  chain->next = NULL;
+
+  /* If the next character is not ',' or ')', then unlink the last
+     argument from argv and schedule it for reparsing.  This way,
+     expand_argument never has to deal with concatenation of argv with
+     arbitrary text.  Note that the implementation of safe_quotes
+     ensures peek_input won't return CHAR_ARGV if the user is perverse
+     enough to mix comment delimiters with argument separators:
+
+       define(n,`$#')define(echo,$*)changecom(`,,',`)')n(echo(a,`,b`)'',c))
+       => 2 (not 3)
+
+     Therefore, we do not have to worry about calling MATCH, and thus
+     do not have to worry about pop_input being called and
+     invalidating the argv reference.
+
+     When the $@ ref is used unchanged, we completely bypass the
+     decrement of the argv refcount in next_char_1, since the ref is
+     still live via the current collect_arguments.  However, when the
+     last element of the $@ ref is reparsed, we must increase the argv
+     refcount here, to compensate for the fact that it will be
+     decreased once the final element is parsed.  */
+  assert (*curr_comm.str1 != ',' && *curr_comm.str1 != ')'
+         && *curr_comm.str1 != *curr_quote.str1);
+  ch = peek_input (false);
+  if (ch != ',' && ch != ')')
+    {
+      isp->u.u_c.chain = src_chain;
+      src_chain->u.u_a.index = arg_argc (chain->u.u_a.argv) - 1;
+      src_chain->u.u_a.comma = true;
+      chain->u.u_a.skip_last = true;
+      arg_adjust_refcount (chain->u.u_a.argv, true);
+    }
+}
+
+
 /*------------------------------------------------------------------.
 | This function is for matching a string against a prefix of the    |
 | input stream.  If the string S matches the input and CONSUME is   |
@@ -1029,7 +1103,7 @@ match_input (const char *s, bool consume)
   const char *t;
   bool result = false;
 
-  ch = peek_input ();
+  ch = peek_input (false);
   if (ch != to_uchar (*s))
     return false;                      /* fail */
 
@@ -1041,7 +1115,7 @@ match_input (const char *s, bool consume)
     }
 
   next_char (false);
-  for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); )
+  for (n = 1, t = s++; (ch = peek_input (false)) == to_uchar (*s++); )
     {
       next_char (false);
       n++;
@@ -1320,18 +1394,20 @@ safe_quotes (void)
 
 
 /*--------------------------------------------------------------------.
-| Parse a single token from the input stream, set TD to its           |
-| contents, and return its type.  A token is TOKEN_EOF if the         |
+| Parse a single token from the input stream, set TD to its          |
+| contents, and return its type.  A token is TOKEN_EOF if the        |
 | input_stack is empty; TOKEN_STRING for a quoted string or comment;  |
-| TOKEN_WORD for something that is a potential macro name; and        |
+| TOKEN_WORD for something that is a potential macro name; and       |
 | TOKEN_SIMPLE for any single character that is not a part of any of  |
 | the previous types.  If LINE is not NULL, set *LINE to the line     |
 | where the token starts.  If OBS is not NULL, expand TOKEN_STRING    |
 | directly into OBS rather than in token_stack temporary storage      |
-| area, and TD could be a TOKEN_COMP instead of the usual             |
-| TOKEN_TEXT.  Report errors (unterminated comments or strings) on    |
-| behalf of CALLER, if non-NULL.                                      |
-|                                                                     |
+| area, and TD could be a TOKEN_COMP instead of the usual            |
+| TOKEN_TEXT.  If ALLOW_ARGV, OBS must be non-NULL, and an entire     |
+| series of arguments can be returned as TOKEN_ARGV when a $@        |
+| reference is encountered.  Report errors (unterminated comments or  |
+| strings) on behalf of CALLER, if non-NULL.                         |
+|                                                                    |
 | Next_token () returns the token type, and passes back a pointer to  |
 | the token data through TD.  Non-string token text is collected on   |
 | the obstack token_stack, which never contains more than one token   |
@@ -1340,7 +1416,8 @@ safe_quotes (void)
 `--------------------------------------------------------------------*/
 
 token_type
-next_token (token_data *td, int *line, struct obstack *obs, const char *caller)
+next_token (token_data *td, int *line, struct obstack *obs, bool allow_argv,
+           const char *caller)
 {
   int ch;
   int quote_level;
@@ -1362,7 +1439,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
 
   /* Can't consume character until after CHAR_MACRO is handled.  */
   TOKEN_DATA_TYPE (td) = TOKEN_VOID;
-  ch = peek_input ();
+  ch = peek_input (allow_argv && current_quote_age);
   if (ch == CHAR_EOF)
     {
 #ifdef DEBUG_INPUT
@@ -1381,6 +1458,17 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
 #endif /* DEBUG_INPUT */
       return TOKEN_MACDEF;
     }
+  if (ch == CHAR_ARGV)
+    {
+      init_argv_token (obs, td);
+#ifdef DEBUG_INPUT
+      xfprintf (stderr, "next_token -> ARGV (%d args)\n",
+               (arg_argc (td->u.u_c.chain->u.u_a.argv)
+                - td->u.u_c.chain->u.u_a.index
+                - (td->u.u_c.chain->u.u_a.skip_last ? 1 : 0)));
+#endif
+      return TOKEN_ARGV;
+    }
 
   next_char (false); /* Consume character we already peeked at.  */
   file = current_file;
@@ -1409,7 +1497,8 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
   else if (default_word_regexp && (isalpha (ch) || ch == '_'))
     {
       obstack_1grow (&token_stack, ch);
-      while ((ch = peek_input ()) < CHAR_EOF && (isalnum (ch) || ch == '_'))
+      while ((ch = peek_input (false)) < CHAR_EOF
+            && (isalnum (ch) || ch == '_'))
        {
          obstack_1grow (&token_stack, ch);
          next_char (false);
@@ -1424,7 +1513,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
       obstack_1grow (&token_stack, ch);
       while (1)
        {
-         ch = peek_input ();
+         ch = peek_input (false);
          if (ch >= CHAR_EOF)
            break;
          obstack_1grow (&token_stack, ch);
@@ -1547,9 +1636,19 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
                  token_type_string (type));
        while (chain)
          {
-           assert (chain->type == CHAIN_STR);
-           xfprintf (stderr, "%s", chain->u.u_s.str);
-           len += chain->u.u_s.len;
+           switch (chain->type)
+             {
+             case CHAIN_STR:
+               xfprintf (stderr, "%s", chain->u.u_s.str);
+               len += chain->u.u_s.len;
+               break;
+             case CHAIN_ARGV:
+               xfprintf (stderr, "address@hidden");
+               break;
+             default:
+               assert (!"next_token");
+               abort ();
+             }
            links++;
            chain = chain->next;
          }
@@ -1569,7 +1668,7 @@ token_type
 peek_token (void)
 {
   token_type result;
-  int ch = peek_input ();
+  int ch = peek_input (false);
 
   if (ch == CHAR_EOF)
     {
@@ -1684,7 +1783,7 @@ lex_debug (void)
   token_type t;
   token_data td;
 
-  while ((t = next_token (&td, NULL, NULL, "<debug>")) != TOKEN_EOF)
+  while ((t = next_token (&td, NULL, NULL, false, "<debug>")) != TOKEN_EOF)
     print_token ("lex", t, &td);
 }
 #endif /* DEBUG_INPUT */
diff --git a/src/m4.h b/src/m4.h
index 0f11366..7df29b8 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -266,7 +266,8 @@ enum token_type
   TOKEN_COMMA, /* Active character `,', TOKEN_TEXT.  */
   TOKEN_CLOSE, /* Active character `)', TOKEN_TEXT.  */
   TOKEN_SIMPLE,        /* Any other single character, TOKEN_TEXT.  */
-  TOKEN_MACDEF /* A macro's definition (see "defn"), TOKEN_FUNC.  */
+  TOKEN_MACDEF,        /* A macro's definition (see "defn"), TOKEN_FUNC.  */
+  TOKEN_ARGV   /* A series of parameters, TOKEN_COMP.  */
 };
 
 /* The data for a token, a macro argument, and a macro definition.  */
@@ -309,6 +310,7 @@ struct token_chain
          unsigned int index;           /* Argument index within argv.  */
          bool_bitfield flatten : 1;    /* True to treat builtins as text.  */
          bool_bitfield comma : 1;      /* True when `,' is next input.  */
+         bool_bitfield skip_last : 1;  /* True if last argument omitted.  */
          const string_pair *quotes;    /* NULL for $*, quotes for 
address@hidden  */
        }
       u_a;
@@ -373,7 +375,8 @@ typedef enum token_data_type token_data_type;
 
 void input_init (void);
 token_type peek_token (void);
-token_type next_token (token_data *, int *, struct obstack *, const char *);
+token_type next_token (token_data *, int *, struct obstack *, bool,
+                      const char *);
 void skip_line (const char *);
 
 /* push back input */
diff --git a/src/macro.c b/src/macro.c
index d686b73..8b85cf6 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -216,7 +216,7 @@ expand_input (void)
   TOKEN_DATA_ORIG_TEXT (&empty_token) = "";
 #endif
 
-  while ((t = next_token (&td, &line, NULL, NULL)) != TOKEN_EOF)
+  while ((t = next_token (&td, &line, NULL, false, NULL)) != TOKEN_EOF)
     expand_token (NULL, t, &td, line, true);
 
   for (i = 0; i < stacks_count; i++)
@@ -364,7 +364,7 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
   /* Skip leading white space.  */
   do
     {
-      t = next_token (&td, NULL, obs, caller);
+      t = next_token (&td, NULL, obs, true, caller);
     }
   while (t == TOKEN_SIMPLE && isspace (to_uchar (*TOKEN_DATA_TEXT (&td))));
 
@@ -455,6 +455,20 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
            }
          break;
 
+       case TOKEN_ARGV:
+         assert (paren_level == 0 && TOKEN_DATA_TYPE (argp) == TOKEN_VOID
+                 && obstack_object_size (obs) == 0
+                 && td.u.u_c.chain == td.u.u_c.end
+                 && td.u.u_c.chain->type == CHAIN_ARGV);
+         TOKEN_DATA_TYPE (argp) = TOKEN_COMP;
+         argp->u.u_c.chain = argp->u.u_c.end = td.u.u_c.chain;
+         t = next_token (&td, NULL, NULL, false, caller);
+         if (argp->u.u_c.chain->u.u_a.skip_last)
+           assert (t == TOKEN_COMMA);
+         else
+           assert (t == TOKEN_COMMA || t == TOKEN_CLOSE);
+         return t == TOKEN_COMMA;
+
        default:
          assert (!"expand_argument");
          abort ();
@@ -462,7 +476,7 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
 
       if (TOKEN_DATA_TYPE (argp) != TOKEN_VOID || obstack_object_size (obs))
        first = false;
-      t = next_token (&td, NULL, obs, caller);
+      t = next_token (&td, NULL, obs, first, caller);
     }
 }
 
@@ -496,7 +510,8 @@ collect_arguments (symbol *sym, struct obstack *arguments,
 
   if (peek_token () == TOKEN_OPEN)
     {
-      next_token (&td, NULL, NULL, SYMBOL_NAME (sym)); /* gobble parenthesis */
+      /* gobble parenthesis */
+      next_token (&td, NULL, NULL, false, SYMBOL_NAME (sym));
       do
        {
          tdp = (token_data *) obstack_alloc (arguments, sizeof *tdp);
@@ -519,12 +534,22 @@ collect_arguments (symbol *sym, struct obstack *arguments,
              && TOKEN_DATA_QUOTE_AGE (tdp) != args.quote_age)
            args.quote_age = 0;
          else if (TOKEN_DATA_TYPE (tdp) == TOKEN_COMP)
-           args.has_ref = true;
+           {
+             args.has_ref = true;
+             if (tdp->u.u_c.chain->type == CHAIN_ARGV)
+               {
+                 args.argc += (tdp->u.u_c.chain->u.u_a.argv->argc
+                               - tdp->u.u_c.chain->u.u_a.index
+                               - tdp->u.u_c.chain->u.u_a.skip_last - 1);
+                 args.wrapper = true;
+               }
+           }
        }
       while (more_args);
     }
   argv = (macro_arguments *) obstack_finish (argv_stack);
   argv->argc = args.argc;
+  argv->wrapper = args.wrapper;
   argv->has_ref = args.has_ref;
   if (args.quote_age != quote_age ())
     argv->quote_age = 0;
@@ -734,9 +759,20 @@ arg_adjust_refcount (macro_arguments *argv, bool increase)
          chain = argv->array[i]->u.u_c.chain;
          while (chain)
            {
-             assert (chain->type == CHAIN_STR);
-             if (chain->u.u_s.level >= 0)
-               adjust_refcount (chain->u.u_s.level, increase);
+             switch (chain->type)
+               {
+               case CHAIN_STR:
+                 if (chain->u.u_s.level >= 0)
+                   adjust_refcount (chain->u.u_s.level, increase);
+                 break;
+               case CHAIN_ARGV:
+                 assert (chain->u.u_a.argv->inuse);
+                 arg_adjust_refcount (chain->u.u_a.argv, increase);
+                 break;
+               default:
+                 assert (!"arg_adjust_refcount");
+                 abort ();
+               }
              chain = chain->next;
            }
        }
@@ -766,12 +802,14 @@ arg_token (macro_arguments *argv, unsigned int index, int 
*level)
   for (i = 0; i < argv->arraylen; i++)
     {
       token = argv->array[i];
-      if (TOKEN_DATA_TYPE (token) == TOKEN_COMP)
+      if (TOKEN_DATA_TYPE (token) == TOKEN_COMP
+         && token->u.u_c.chain->type == CHAIN_ARGV)
        {
          token_chain *chain = token->u.u_c.chain;
          /* TODO - for now we support only a single-length $@ chain.  */
-         assert (!chain->next && chain->type == CHAIN_ARGV);
-         if (index < chain->u.u_a.argv->argc - (chain->u.u_a.index - 1))
+         assert (!chain->next);
+         if (index <= (chain->u.u_a.argv->argc - chain->u.u_a.index
+                       - chain->u.u_a.skip_last))
            {
              token = arg_token (chain->u.u_a.argv,
                                 chain->u.u_a.index - 1 + index, level);
@@ -780,7 +818,8 @@ arg_token (macro_arguments *argv, unsigned int index, int 
*level)
                token = &empty_token;
              break;
            }
-         index -= chain->u.u_a.argv->argc - chain->u.u_a.index;
+         index -= (chain->u.u_a.argv->argc - chain->u.u_a.index
+                   - chain->u.u_a.skip_last);
        }
       else if (--index == 0)
        break;
@@ -793,18 +832,24 @@ arg_token (macro_arguments *argv, unsigned int index, int 
*level)
 static void
 arg_mark (macro_arguments *argv)
 {
+  unsigned int i;
+  token_chain *chain;
+
   if (argv->inuse)
     return;
   argv->inuse = true;
   if (argv->wrapper)
-    {
-      /* TODO for now we support only a single-length $@ chain.  */
-      assert (argv->arraylen == 1
-             && TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP
-             && !argv->array[0]->u.u_c.chain->next
-             && argv->array[0]->u.u_c.chain->type == CHAIN_ARGV);
-      argv->array[0]->u.u_c.chain->u.u_a.argv->inuse = true;
-    }
+    for (i = 0; i < argv->arraylen; i++)
+      if (TOKEN_DATA_TYPE (argv->array[i]) == TOKEN_COMP)
+       {
+         chain = argv->array[i]->u.u_c.chain;
+         while (chain)
+           {
+             if (chain->type == CHAIN_ARGV && !chain->u.u_a.argv->inuse)
+               arg_mark (chain->u.u_a.argv);
+             chain = chain->next;
+           }
+       }
 }
 
 /* Given ARGV, return how many arguments it refers to.  */
@@ -854,14 +899,24 @@ arg_text (macro_arguments *argv, unsigned int index)
     case TOKEN_TEXT:
       return TOKEN_DATA_TEXT (token);
     case TOKEN_COMP:
-      /* TODO - concatenate multiple arguments?  For now, we assume
-        all elements are text.  */
+      /* TODO - concatenate functions.  */
       chain = token->u.u_c.chain;
       obs = arg_scratch ();
       while (chain)
        {
-         assert (chain->type == CHAIN_STR);
-         obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
+         switch (chain->type)
+           {
+           case CHAIN_STR:
+             obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
+             break;
+           case CHAIN_ARGV:
+             arg_print (obs, chain->u.u_a.argv, chain->u.u_a.index,
+                        chain->u.u_a.quotes, NULL);
+             break;
+           default:
+             assert (!"arg_text");
+             abort ();
+           }
          chain = chain->next;
        }
       obstack_1grow (obs, '\0');
@@ -1122,13 +1177,13 @@ make_argv_ref_token (token_data *token, struct obstack 
*obs, int level,
   token_chain *chain;
 
   assert (obstack_object_size (obs) == 0);
-  if (argv->wrapper)
+  if (argv->wrapper && argv->arraylen == 1)
     {
       /* TODO for now we support only a single-length $@ chain.  */
-      assert (argv->arraylen == 1
-             && TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP);
+      assert (TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP);
       chain = argv->array[0]->u.u_c.chain;
-      assert (!chain->next && chain->type == CHAIN_ARGV);
+      assert (!chain->next && chain->type == CHAIN_ARGV
+             && !chain->u.u_a.skip_last);
       argv = chain->u.u_a.argv;
       index += chain->u.u_a.index - 1;
     }
@@ -1145,6 +1200,7 @@ make_argv_ref_token (token_data *token, struct obstack 
*obs, int level,
   chain->u.u_a.index = index;
   chain->u.u_a.flatten = flatten;
   chain->u.u_a.comma = false;
+  chain->u.u_a.skip_last = false;
   if (quotes)
     {
       /* Clone the quotes into the obstack, since a subsequent
diff --git a/src/symtab.c b/src/symtab.c
index 277a79f..dac49d7 100644
--- a/src/symtab.c
+++ b/src/symtab.c
@@ -350,7 +350,7 @@ symtab_debug (void)
   int delete;
   static int i;
 
-  while (next_token (&td, NULL, NULL, "<debug>") == TOKEN_WORD)
+  while (next_token (&td, NULL, NULL, false, "<debug>") == TOKEN_WORD)
     {
       text = TOKEN_DATA_TEXT (&td);
       if (*text == '_')
-- 
1.5.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]