[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 R
From: |
Ed Morton |
Subject: |
bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 RE capture groups and backreferences |
Date: |
Thu, 25 Jan 2024 10:46:34 -0600 |
User-agent: |
Mozilla Thunderbird |
There are issues (mostly common but some not) using a regexp like this:
|^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$|
with GNU grep and GNU sed, hence my contacting both mailing lists but
apologies if that was the wrong starting point.
This started out as a question on StackOverflow,
(https://stackoverflow.com/questions/77820540/searching-palindromes-with-grep-e-egrep/77861446?noredirect=1#comment137299746_77861446)
but my "answer" and some comments from there copied below so you don't
have to look anywhere else for a description of the issues.
Given this input file:
|a|
|ab|
|abba|
|abcdef|
|abcba|
|zufolo|
|||Removing the `$` from the end of the regexp (i.e. making it less
restrictive) produces fewer matches, which is the opposite of what it
should do: a) With the `$` at the end of the regexp: $ grep -E
'^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample a abba abcba zufolo b)
Without the `$` at the end of the regexp: $ grep -E
'^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample a abba abcba It's not just
GNU grep that behaves strangely, GNU sed has the same behavior from the
question when just matching with `sed -nE '/.../p' sample` as GNU `grep`
does AND sed behaves differently if we're just doing a match vs if we're
doing a match + replace. For example here's `sed` doing a
match+replacement and behaving the same way as `grep` above: a) With the
`$` at the end of the regexp: $ sed -nE
's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/&/p' sample a abba abcba zufolo b)
Without the `$` at the end of the regexp: $ sed -nE
's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/&/p' sample a abba abcba but here's
sed just doing a match and behaving differently from any of the above:
a) With the `$` at the end of the regexp (note the extra `ab` in the
output): $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/p' sample a ab
abba abcba zufolo b) Without the `$` at the end of the regexp (note the
extra `ab` and `abcdef` in the output): $ sed -nE
'/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/p' sample a ab abba abcdef abcba
zufolo Also interestingly this: $ sed -nE
's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/<&>/p' sample outputs: <a> <abba>
<abcba> <>zufolo the last line of which means the regexp is apparently
matching the start of the line and ignoring the `$` end-of-string
metachar present in the regexp! The odd behavior isn't just associated
with using `-E`, though, if I remove `-E` and just use [POSIX compliant
BREs](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03)
then: a) With the `$` at the end of the regexp: $ grep
'^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$'
sample a abba abcba zufolo <p> $ sed -n
's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/&/p'
sample a abba abcba zufolo b) Without the `$` at the end of the regexp:
$ grep
'^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1'
sample a abba abcba <p> $ sed -n
's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/&/p'
sample a abba abcba and again just doing a match in sed below behaves
differently from the sed match+replacements above: a) With the `$` at
the end of the regexp: $ sed -n
'/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/p'
sample a ab abba abcba zufolo b) Without the `$` at the end of the
regexp: $ sed -n
'/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/p'
sample a ab abba abcdef abcba zufolo The above shows that, given the
same regexp, sed is apparently matching different strings depending on
whether it's doing a substitution or not. These are the version I was
using when testing above: $ grep --version | head -1 grep (GNU grep)
3.11 $ sed --version | head -1 sed (GNU sed) 4.9 It was later pointed
out that grep in git-=bash produces an error message and core dumps
given the original regexp above|, e.g. |grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample| and |grep -E
'^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample| both output|: a assertion
"num >= 0" failed: file "regexec.c", line 1394, function: pop_fail_stack
Aborted (core dumped)|. Sorry, I can't copy the core off that machine
for corporate reasons. Those git-bash tests were using |$ echo
$BASH_VERSION| |5.2.15(1)-release ||$ grep --version||grep (GNU grep) 3.0|
|Regards, Ed Morton |
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 RE capture groups and backreferences,
Ed Morton <=