[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#40242: n as delimiter alias
From: |
Oğuz |
Subject: |
bug#40242: n as delimiter alias |
Date: |
Tue, 31 Mar 2020 10:00:02 +0300 |
Thanks for the reply. This might not be a bug though; I sent a similar mail
(https://www.mail-archive.com/address@hidden/msg05881.html)
to Austin Group mailing list asking what's the expected behavior in this
case, and I was told (
https://www.mail-archive.com/address@hidden/msg05891.html)
both behaviors -yielding n or empty line- are correct and standard should
*probably* be amended to explicitly state that this is unspecified. And
apparently (
https://www.mail-archive.com/address@hidden/msg05893.html)
some other UNIXes adopted the same practice as GNU sed (or vice versa, I
don't know which one is older).
Regards
31 Mart 2020 Salı tarihinde Assaf Gordon <address@hidden> yazdı:
> tags 40242 confirmed
> stop
>
> Hello,
>
> On 2020-03-25 11:30 p.m., Oğuz wrote:
>
>> While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
>> match 'n' when 'n' is the delimiter. See:
>>
>> $ echo t | sed 'st\ttt' | xxd
>> 00000000: 0a .
>> $
>> $ echo n | sed 'sn\nnn' | xxd
>> 00000000: 6e0a
>>
>> Is this a bug or is there a sound logic behind this?
>>
>
> Thank you for finding this interesting edge-case.
>
> I think it is a (very old) bug. I'm not sure about its origin,
> perhaps Jim or Paolo can comment.
>
> First,
> let's start with what's expected (slightly modifying your examples):
>
> The canonical usage, here "\t" becomes a TAB, and "t" is not replaced:
>
> $ printf t | sed 's/\t//' | od -a -An
> t
>
> Then, using a different character "q" instead of "/", works the same:
>
> $ printf t | sed 'sq\tqq' | od -a -An
> t
>
> The sed manual says (in section "3.3 The s command"):
> "
> The / characters may be uniformly replaced by any other single
> character within any given s command.
>
> The / character (or whatever other character is used in its
> stead) can appear in the regexp or replacement only if it is
> preceded by a \ character.
> "
>
> This is the reason "\t" represents a regular "t" (not TAB)
> *if* the substitute command's delimiter is "t" as well:
>
> $ printf t | sed 'st\ttt' | od -a -An
> [no output, as expected]
>
> And similarly for other characters:
>
> printf x | sed 'sx\xxx' | od -a -An
> printf a | sed 'sa\aaa' | od -a -An
> printf z | sed 'sz\zzz' | od -a -An
> [no output, as expected]
>
> ---
>
> Second,
> The "\n" case behaves differently, regardless of which
> separator is used. It is always treated as "\n" (new line),
> never literal "n", even if the separator is "n":
>
> These are correct, as expected:
> $ printf n | sed 's/\n//' | od -a -An
> n
> $ printf n | sed 's/\n//' | od -a -An
> n
> $ printf n | sed 'sx\nxx' | od -a -An
> n
>
> Here, we'd expect "\n" to be treated as a literal "n" character,
> not "\n", but it is not (as you've found):
>
> $ printf n | sed 'sn\nnn' | od -a -An
> n
>
> ----
>
> In the code, the "match_slash" function [1] is used to find
> the delimiters of the "s" command (typically "slashes").
> Special handling happens if a slash is found [2],
> And in lines 557-8 there's this conditional:
>
> else if (ch == 'n' && regex)
> ch = '\n';
>
> Which forces any "\n" to be a new-line, regardless if the
> delimiter itself was an "n".
>
> [1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531
> [2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552
>
> In older sed versions, these two lines where protected by
> "#ifndef REG_PERL" [3] so perhaps it had something to do with regex
> variants. But the origin of this line predates the git history.
> Jim/Paolo - any ideas what this relates to?
>
> https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c
> ?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551
>
> ---
>
> Interestingly, removing these two lines does not cause
> any test failures, so this might be easy to fix without causing
> any regressions.
>
>
> For now I'm leaving this item open until we decide how to deal with it.
>
> regards,
> - assaf
>
>
>
>
>
--
Oğuz