--- Begin Message ---
Subject: |
bug: empty regex exits with error when following 2-address like LINENO,/RE/ |
Date: |
Thu, 15 Mar 2018 21:20:37 +0100 |
The manual states that
"the empty regular _expression_ ‘//’ repeats the last regular _expression_ match"
however this does not work when the empty regex follows a 2-address of the form LINE_NUMBER,/REGEX/
e.g.
# printf %s\\n {1..5} | sed '2,/5/{//!d}'
sed: -e _expression_ #1, char 0:
instead of printing
1
5
If it matters, a 2-address like /REGEX/,LINE_NUMBER works as expected e.g.:
# printf %s\\n {1..5} | sed '/2/,5{//!d}'
correctly prints
1
2
This is with gnu sed 4.4 on archlinux, vanilla.
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#30829: bug: empty regex exits with error when following 2-address like LINENO, /RE/ |
Date: |
Thu, 15 Mar 2018 16:34:07 -0600 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
Hello,
On Thu, Mar 15, 2018 at 09:20:37PM +0100, Don Crissti wrote:
> "the empty regular expression ‘//’ repeats the last regular expression
> match"
>
> however this does not work when the empty regex follows a 2-address of
> the form LINE_NUMBER,/REGEX/
> e.g.
>
> # printf %s\\n {1..5} | sed '2,/5/{//!d}'
>
> fails with
>
> "sed: -e expression #1, char 0: no previous regular expression"
Thanks for reporting this bug and providing an easy way to reproduce.
Before deciding if it's a bug or not, it's worth comparing to other sed's.
(I'm using a slightly different sed program because multiple
commands on the same line is a GNU extension.)
FreeBSD/OpenBSD/NetBSD:
$ printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p'
sed: first RE may not be empty
BusyBox and ToyBox (output seems incorrect):
$ printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p'
1
2
2
3
3
4
4
5
5
Heirloom (http://heirloom.sf.net/):
$ seq 5 | sed-heirloom -n -e '2,/5/p' -e '//p'
2
3
4
5
5
And surprisingly, GNU sed version 3.02:
$ seq 5 | sed-gnu-3.02 -n -e '2,/5/p' -e '//p'
2
3
4
5
5
GNU sed 4.0 and later:
$ seq 5 | sed -n -e '2,/5/p' -e '//p'
sed: -e expression #2, char 0: no previous regular expression
=====
Now to why it happens:
POSIX says (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html):
"If an RE is empty (that is, no pattern is specified) sed shall behave as if
the
last RE used in the last command applied (either as an address or as part of
a
substitute command) was specified."
And the interpertation (of both GNU sed >4.0 and *BSD's sed) is
that the "last RE used in the last command *applied*" means the last RE
*executed*
- not the last regex that preceeds the empty regex in the program.
And so in this command:
sed -n -e '2,/5/p' -e '//p'
On the first line, the address 2 is checked (it doesn't match on line 1
obviously).
the regex '/5/' is *not* executed (because 2 didn't match).
Then sed tries '//p' - but there was no RE executed - hence the error.
The reason for this is that empty (last) regex can be changed
during runtime, based on the input.
Consider the following (contrived) example:
$ printf "%s\n" a ab ab ab \
| sed '1s/a/X/
tq
1s/b/Y/
:q
s//*/'
X
*b
*b
*b
$ printf "%s\n" b ab ab ab \
| sed '1s/a/X/
tq
1s/b/Y/
:q
s//*/'
Y
a*
a*
a*
The flow is:
1. If line 1 contains 'a' - replace 'a' with 'X' and skip the next check
('tq' means "jump to label :q if the last subsitution matched").
2. If line 1 contains 'b' - replace 'b' with 'Y'.
3. For every line, replace the last regex with '*'.
And so you see that the last regex changes dynamically during
runtime, based on whether the first line contained 'a' or 'b'.
In the first case, the three 'a's are replaced with '*'.
In the second case, the three 'b's are replaced with '*'.
I therefore think this is not a bug (and I'm marking it as 'done').
However discussion can continue by replying to this thread,
and if there are different opinions we can always re-open it.
regards,
- assaf
--- End Message ---