[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Terminal backslash in cmdline strings mishandled
From: |
Miguel Pineiro Jr. |
Subject: |
Terminal backslash in cmdline strings mishandled |
Date: |
Mon, 14 Aug 2023 07:36:09 -0400 |
User-agent: |
Cyrus-JMAP/3.9.0-alpha0-624-g7714e4406d-fm-20230801.001-g7714e440 |
Hello everyone. I hope this finds you well.
While working on some escape sequence fixes for onetrueawk, I noticed
that gawk does not preserve the terminal backslash of -F and -v option
arguments and var=value assignment operands.
I confirmed that the problem is present in the current tip of the master
branch: 8d73db57 (Wed Aug 9 21:58:46 2023 -0700).
#!/bin/sh
# Each of the following tests should produce "5c 0a" (a backslash
# followed by a newline in hex). Presently, master produces "0a".
awk=${1:-gawk}
# Terminal backslash
$awk -F'\' 'BEGIN { print FS }' | od -An -tx1
$awk -v s='\' 'BEGIN { print s }' | od -An -tx1
echo | $awk '{ print s }' s='\' | od -An -tx1
# Line-continuation immediately followed by a terminal backslash
$awk -F'\
\' 'BEGIN { print FS }' | od -An -tx1
$awk -v s='\
\' 'BEGIN { print s }' | od -An -tx1
echo | $awk '{ print s }' s='\
\' | od -An -tx1
exit 0
# End of test script
I have included a possible fix at the end of this report.
It appears as if the handling of terminal backslashes and line
continuations were comingled. My patch more closely inspects
parse_escape's negative return values so that each case is handled
distinctly.
It also deletes ELIDE_BACK_NL, but of this I'm not as sure (I'm not
familiar with gawk's lexers). Is there an occasion when SCAN is set
to translate escape sequences but the line continuation sequence is
not elided? If so, how that should be handled wasn't specified in the
buggy code. If not, then elision is unconditional and ELIDE_BACK_NL is
not needed.
The patch compiles cleanly, make check succeeds, and the tests included
in this report produce correct results.
Thank you for taking the time to read this. Your efforts are appreciated.
Take care,
Miguel
diff --git a/awk.h b/awk.h
index 57d4f8bd..df1e862b 100644
--- a/awk.h
+++ b/awk.h
@@ -1376,7 +1376,6 @@ extern void r_freeblock(void *, int id);
// Flags for making string nodes
#define SCAN 1
#define ALREADY_MALLOCED 2
-#define ELIDE_BACK_NL 4
#define cant_happen(format, ...) r_fatal("internal error: file
%s, line %d: " format, \
__FILE__, __LINE__, __VA_ARGS__)
diff --git a/main.c b/main.c
index cc5ae9a7..9b63bad3 100644
--- a/main.c
+++ b/main.c
@@ -776,7 +776,7 @@ cmdline_fs(char *str)
str[0] = '\t';
}
- *tmp = make_str_node(str, strlen(str), SCAN | ELIDE_BACK_NL); /* do
process escapes */
+ *tmp = make_str_node(str, strlen(str), SCAN); /* do process escapes */
set_FS();
}
@@ -1276,7 +1276,7 @@ arg_assign(char *arg, bool initing)
* This makes sense, so we do it too.
* In addition, remove \-<newline> as in scanning.
*/
- it = make_str_node(cp, strlen(cp), SCAN | ELIDE_BACK_NL);
+ it = make_str_node(cp, strlen(cp), SCAN);
it->flags |= USER_INPUT;
#ifdef LC_NUMERIC
/*
diff --git a/node.c b/node.c
index 2a476847..8173e9bd 100644
--- a/node.c
+++ b/node.c
@@ -456,13 +456,18 @@ make_str_node(const char *s, size_t len, int flags)
if (c == '\\') {
bool unicode;
c = parse_escape(&pf, &unicode);
- if (c < 0) {
+ if (c == -1) {
if (do_lint)
- lintwarn(_("backslash string
continuation is not portable"));
- if ((flags & ELIDE_BACK_NL) != 0)
- continue;
+ lintwarn(_("backslash at end of
string"));
c = '\\';
+ } else if (c == -2) {
+ if (do_lint)
+ lintwarn(_("backslash string
continuation is not portable"));
+ continue;
+ } else if (c < 0) {
+ fatal(_("parse_escape returned
something unexpected"));
}
+
if (unicode) {
char buf[20];
size_t n;
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Terminal backslash in cmdline strings mishandled,
Miguel Pineiro Jr. <=