[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23635: possible bug in \c escape handling
From: |
Jim Meyering |
Subject: |
bug#23635: possible bug in \c escape handling |
Date: |
Sat, 28 May 2016 15:06:25 -0700 |
On Fri, May 27, 2016 at 6:08 PM, Assaf Gordon <address@hidden> wrote:
> Hello,
>
> There might be a small bug in processing of GNU extension escape sequence
> "\c".
>
> When the character following "\c" is a backslash, the code consumes only one
> character, leading to inconsistent and incorrect output.
> Example:
>
> $ echo a | sed 's/./\c\\/' | od -c
> 0000000 034 \ \n
> 0000003
> $ echo a | sed 's/./\c\d/' | od -c
> 0000000 034 d \n
> 0000003
>
> but:
>
> $ echo a | sed 's/./\c\/' | od -c
> sed: -e expression #1, char 8: unterminated `s' command
> 0000000
>
> Meaning there is no way to generate the character '\x034' alone with "\c".
>
> This is also somewhat inconsistent because it consumes a single backslash
> character
> (whereas everywhere else a single backslash is the escape character itself).
>
> For comparison, other characters behave as expected:
>
> $ sed 's/./\cA/' in | od -c
> 0000000 001 \n
> 0000002
> $ sed 's/./\c[/' in | od -c
> 0000000 033 \n
> 0000002
> $ sed 's/./\c]/' in | od -c
> 0000000 035 \n
> 0000002
>
> As a side effect, it could also be confusing if the syntax allows
> 'recursive' escapes,
> such as "\c\x41", which might be argued to be '\c' of the following
> character,
> which should be first evaluated as \x61, resulting in "\cA".
>
> The attached patch fixes the problem with the following rules:
> 1. '\c\\' = Control-Backslash = ASCII 0x34.
> 2. Any other backslash combinations after "\c" are rejected, and sed aborts.
>
> Tests included. comments are welcomed.
Nice catch. I like the patch.
So far, I can make only two suggestions:
- add a NEWS entry, since this is a bug fix
- I have a slight preference for the one-liner printf '%s\n' a a a a
a a a ---- rather than your 7-line here-document to generate that same
output in the test case.
And a comment wording nit:
+# Before sed-4.3, this resulted in '\034d' .
+# now it should be rejected.
I prefer to say e.g.,
# Before sed-4.3, this resulted in '\034d'. Now, it is rejected.
Thank you!