[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#40242: n as delimiter alias
From: |
Assaf Gordon |
Subject: |
bug#40242: n as delimiter alias |
Date: |
Mon, 30 Mar 2020 22:42:09 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 |
tags 40242 confirmed
stop
Hello,
On 2020-03-25 11:30 p.m., Oğuz wrote:
While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
match 'n' when 'n' is the delimiter. See:
$ echo t | sed 'st\ttt' | xxd
00000000: 0a .
$
$ echo n | sed 'sn\nnn' | xxd
00000000: 6e0a
Is this a bug or is there a sound logic behind this?
Thank you for finding this interesting edge-case.
I think it is a (very old) bug. I'm not sure about its origin,
perhaps Jim or Paolo can comment.
First,
let's start with what's expected (slightly modifying your examples):
The canonical usage, here "\t" becomes a TAB, and "t" is not replaced:
$ printf t | sed 's/\t//' | od -a -An
t
Then, using a different character "q" instead of "/", works the same:
$ printf t | sed 'sq\tqq' | od -a -An
t
The sed manual says (in section "3.3 The s command"):
"
The / characters may be uniformly replaced by any other single
character within any given s command.
The / character (or whatever other character is used in its
stead) can appear in the regexp or replacement only if it is
preceded by a \ character.
"
This is the reason "\t" represents a regular "t" (not TAB)
*if* the substitute command's delimiter is "t" as well:
$ printf t | sed 'st\ttt' | od -a -An
[no output, as expected]
And similarly for other characters:
printf x | sed 'sx\xxx' | od -a -An
printf a | sed 'sa\aaa' | od -a -An
printf z | sed 'sz\zzz' | od -a -An
[no output, as expected]
---
Second,
The "\n" case behaves differently, regardless of which
separator is used. It is always treated as "\n" (new line),
never literal "n", even if the separator is "n":
These are correct, as expected:
$ printf n | sed 's/\n//' | od -a -An
n
$ printf n | sed 's/\n//' | od -a -An
n
$ printf n | sed 'sx\nxx' | od -a -An
n
Here, we'd expect "\n" to be treated as a literal "n" character,
not "\n", but it is not (as you've found):
$ printf n | sed 'sn\nnn' | od -a -An
n
----
In the code, the "match_slash" function [1] is used to find
the delimiters of the "s" command (typically "slashes").
Special handling happens if a slash is found [2],
And in lines 557-8 there's this conditional:
else if (ch == 'n' && regex)
ch = '\n';
Which forces any "\n" to be a new-line, regardless if the
delimiter itself was an "n".
[1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531
[2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552
In older sed versions, these two lines where protected by
"#ifndef REG_PERL" [3] so perhaps it had something to do with regex
variants. But the origin of this line predates the git history.
Jim/Paolo - any ideas what this relates to?
https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551
---
Interestingly, removing these two lines does not cause
any test failures, so this might be easy to fix without causing
any regressions.
For now I'm leaving this item open until we decide how to deal with it.
regards,
- assaf