[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#51462: sed bug: ASCII NUL not handled in simple pattern
From: |
Assaf Gordon |
Subject: |
bug#51462: sed bug: ASCII NUL not handled in simple pattern |
Date: |
Sat, 30 Oct 2021 01:11:35 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 |
(Adding Eric Blake for POSIX opinion)
Hello,
On 2021-10-28 11:32 a.m., Davide Brini wrote:
On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter <fw@immunant.com>
wrote:
Compare the output of these two sed invocations:
```
$ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
$ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d'
(\o000, \x00 also work). All documented here:
https://www.gnu.org/software/sed/manual/sed.html#Escapes
Whether sed maintainers want to also allow the \0 syntax, up to them of
course.
Thanks Davide for the reply.
In GNU sed, "\0" in the replacement part acts identically to "&" -
referencing the whole matched portion.
This is the implemented behavior (though undocumented?) since GNU sed
version 3, released in December 1995 - so not likely to be changed.
For comparison, in BSDs "\0" acts as literal zero (ASCII 48).
Interestingly, POSIX defines a "BACKREF" as:
[...] The character string consisting of a <backslash> character
followed by a single-digit numeral, '1' to '9'.
( from:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_05
)
And so one could argue that this is a GNU extension that should be
disabled when used with "sed --posix".
I think we should keep "\0" undocumented to prevent proliferation of
this non-standard behavior.
regards,
- assaf