[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#49873: Replacing all \n with spaces doesn't work in GNU sed as expec
From: |
Assaf Gordon |
Subject: |
bug#49873: Replacing all \n with spaces doesn't work in GNU sed as expected |
Date: |
Wed, 4 Aug 2021 14:07:22 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 |
tag 49873 notabug
close 49873
stop
Hello,
On 2021-08-04 4:27 a.m., AlvinSeville7cf wrote:
Hello! I want to read entire file and then replace all *\n* with space.
For that I would recommend using 'tr' - it'll be much faster:
tr '\n' ' ' < input > output
My sed script is (I know that it is not optimal but it demonstrates
problem):
|:a $! { N; ta } s/\n/ /g p |
The above script isn't valid as-is (perhaps line breaks were lost in the
email?).
I'm going to assume you meant the following script, and used "sed -n":
sed -n ':a $! { N; ta } ; s/\n/ /g ; p' < input > output
or with line breaks:
sed -n ':a
$! { N; ta }
s/\n/ /g
p' < input > output
So why even with *g* flag *s* command replaces only first *\n* in
pattern space? For instance I have the following file:
You script is almost correct :)
I assume that with the "$!{N;ta}" command you meant to accumulate all
lines except the last in the pattern space, and then replace all
the new lines and print the patern space.
The only 'bug': "t" is "conditional jump".
It jumps once to label "a", accumulating one more line, but then
doesn't jump again - so the "s///" is executed and the two lines are
printed (and one newline replaced with space). The "s///" command also
resets the "t" conditional, so the next line (3rd line in the input
file) then does causes a jump.
Observe:
$ seq 10 | sed -n ':a $! { N; ta } ; s/\n/ /g ; p'
1 2
3 4
5 6
7 8
9 10
If you replace the "t" with a "b" command (b = always jump),
it behaves as you expected:
$ seq 10 | sed -n ':a $! { N; ba } ; s/\n/ /g ; p'
1 2 3 4 5 6 7 8 9 10
Note that even with this script, the last newline is preserved and
printed.
As a work-around, you can instruct "sed" to use NUL as line-breaks,
causing "\n" characters to be treated like any other character:
$ seq 10 | sed -z 's/\n/ /g'
1 2 3 4 5 6 7 8 9 10
But this won't be as efficient as using 'tr'.
|It was the best of times, it was the worst of times, it was the age of
wisdom, it was the age of foolishness, |
The result of script execution is:
|It was the best of times, it was the worst of times, it was the age of
wisdom, it was |
I use GNU sed 4.8. It seems to be a bug.
Without line breaks it's a bit hard to reproduce your case,
but I hope the explanation above was sufficient.
As such I'm closing this as "not a bug",
but discussion can continue by replying to this thread.
regards,
- assaf