bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Terminal backslash in cmdline strings mishandled


From: Miguel Pineiro Jr.
Subject: Terminal backslash in cmdline strings mishandled
Date: Mon, 14 Aug 2023 07:36:09 -0400
User-agent: Cyrus-JMAP/3.9.0-alpha0-624-g7714e4406d-fm-20230801.001-g7714e440

Hello everyone. I hope this finds you well.

While working on some escape sequence fixes for onetrueawk, I noticed
that gawk does not preserve the terminal backslash of -F and -v option
arguments and var=value assignment operands.

I confirmed that the problem is present in the current tip of the master
branch: 8d73db57 (Wed Aug  9 21:58:46 2023 -0700).


#!/bin/sh

# Each of the following tests should produce "5c 0a" (a backslash
# followed by a newline in hex). Presently, master produces "0a".

awk=${1:-gawk}

# Terminal backslash
$awk -F'\' 'BEGIN { print FS }' | od -An -tx1
$awk -v s='\' 'BEGIN { print s }' | od -An -tx1
echo | $awk '{ print s }' s='\' | od -An -tx1


# Line-continuation immediately followed by a terminal backslash
$awk -F'\
\' 'BEGIN { print FS }' | od -An -tx1
$awk -v s='\
\' 'BEGIN { print s }' | od -An -tx1
echo | $awk '{ print s }' s='\
\' | od -An -tx1

exit 0
# End of test script


I have included a possible fix at the end of this report.

It appears as if the handling of terminal backslashes and line
continuations were comingled. My patch more closely inspects
parse_escape's negative return values so that each case is handled
distinctly.

It also deletes ELIDE_BACK_NL, but of this I'm not as sure (I'm not
familiar with gawk's lexers). Is there an occasion when SCAN is set
to translate escape sequences but the line continuation sequence is
not elided? If so, how that should be handled wasn't specified in the
buggy code. If not, then elision is unconditional and ELIDE_BACK_NL is
not needed.

The patch compiles cleanly, make check succeeds, and the tests included
in this report produce correct results.

Thank you for taking the time to read this. Your efforts are appreciated.

Take care,
Miguel


diff --git a/awk.h b/awk.h
index 57d4f8bd..df1e862b 100644
--- a/awk.h
+++ b/awk.h
@@ -1376,7 +1376,6 @@ extern void r_freeblock(void *, int id);
 // Flags for making string nodes
 #define                SCAN                    1
 #define                ALREADY_MALLOCED        2
-#define                ELIDE_BACK_NL           4
 
 #define        cant_happen(format, ...)        r_fatal("internal error: file 
%s, line %d: " format, \
                                __FILE__, __LINE__, __VA_ARGS__)
diff --git a/main.c b/main.c
index cc5ae9a7..9b63bad3 100644
--- a/main.c
+++ b/main.c
@@ -776,7 +776,7 @@ cmdline_fs(char *str)
                        str[0] = '\t';
        }
 
-       *tmp = make_str_node(str, strlen(str), SCAN | ELIDE_BACK_NL); /* do 
process escapes */
+       *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes */
        set_FS();
 }
 
@@ -1276,7 +1276,7 @@ arg_assign(char *arg, bool initing)
                 * This makes sense, so we do it too.
                 * In addition, remove \-<newline> as in scanning.
                 */
-               it = make_str_node(cp, strlen(cp), SCAN | ELIDE_BACK_NL);
+               it = make_str_node(cp, strlen(cp), SCAN);
                it->flags |= USER_INPUT;
 #ifdef LC_NUMERIC
                /*
diff --git a/node.c b/node.c
index 2a476847..8173e9bd 100644
--- a/node.c
+++ b/node.c
@@ -456,13 +456,18 @@ make_str_node(const char *s, size_t len, int flags)
                        if (c == '\\') {
                                bool unicode;
                                c = parse_escape(&pf, &unicode);
-                               if (c < 0) {
+                               if (c == -1) {
                                        if (do_lint)
-                                               lintwarn(_("backslash string 
continuation is not portable"));
-                                       if ((flags & ELIDE_BACK_NL) != 0)
-                                               continue;
+                                               lintwarn(_("backslash at end of 
string"));
                                        c = '\\';
+                               } else if (c == -2) {
+                                       if (do_lint)
+                                               lintwarn(_("backslash string 
continuation is not portable"));
+                                       continue;
+                               } else if (c < 0) {
+                                       fatal(_("parse_escape returned 
something unexpected"));
                                }
+
                                if (unicode) {
                                        char buf[20];
                                        size_t n;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]