bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 R


From: Ed Morton
Subject: bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 RE capture groups and backreferences
Date: Thu, 25 Jan 2024 10:46:34 -0600
User-agent: Mozilla Thunderbird

There are issues (mostly common but some not) using a regexp like this:

   |^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$|

with GNU grep and GNU sed, hence my contacting both mailing lists but apologies if that was the wrong starting point.

This started out as a question on StackOverflow, (https://stackoverflow.com/questions/77820540/searching-palindromes-with-grep-e-egrep/77861446?noredirect=1#comment137299746_77861446) but my "answer" and some comments from there copied below so you don't have to look anywhere else for a description of the issues.

Given this input file:

|a|
|ab|
|abba|
|abcdef|
|abcba|
|zufolo|
|||Removing the `$` from the end of the regexp (i.e. making it less restrictive) produces fewer matches, which is the opposite of what it should do: a) With the `$` at the end of the regexp: $ grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample a abba abcba zufolo b) Without the `$` at the end of the regexp: $ grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample a abba abcba It's not just GNU grep that behaves strangely, GNU sed has the same behavior from the question when just matching with `sed -nE '/.../p' sample` as GNU `grep` does AND sed behaves differently if we're just doing a match vs if we're doing a match + replace. For example here's `sed` doing a match+replacement and behaving the same way as `grep` above: a) With the `$` at the end of the regexp: $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/&/p' sample a abba abcba zufolo b) Without the `$` at the end of the regexp: $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/&/p' sample a abba abcba but here's sed just doing a match and behaving differently from any of the above: a) With the `$` at the end of the regexp (note the extra `ab` in the output): $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/p' sample a ab abba abcba zufolo b) Without the `$` at the end of the regexp (note the extra `ab` and `abcdef` in the output): $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/p' sample a ab abba abcdef abcba zufolo Also interestingly this: $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/<&>/p' sample outputs: <a> <abba> <abcba> <>zufolo the last line of which means the regexp is apparently matching the start of the line and ignoring the `$` end-of-string metachar present in the regexp! The odd behavior isn't just associated with using `-E`, though, if I remove `-E` and just use [POSIX compliant BREs](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03) then: a) With the `$` at the end of the regexp: $ grep '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$' sample a abba abcba zufolo <p> $ sed -n 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/&/p' sample a abba abcba zufolo b) Without the `$` at the end of the regexp: $ grep '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1' sample a abba abcba <p> $ sed -n 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/&/p' sample a abba abcba and again just doing a match in sed below behaves differently from the sed match+replacements above: a) With the `$` at the end of the regexp: $ sed -n '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/p' sample a ab abba abcba zufolo b) Without the `$` at the end of the regexp: $ sed -n '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/p' sample a ab abba abcdef abcba zufolo The above shows that, given the same regexp, sed is apparently matching different strings depending on whether it's doing a substitution or not. These are the version I was using when testing above: $ grep --version | head -1 grep (GNU grep) 3.11 $ sed --version | head -1 sed (GNU sed) 4.9 It was later pointed out that grep in git-=bash produces an error message and core dumps given the original regexp above|, e.g. |grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample| and |grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample| both output|: a assertion "num >= 0" failed: file "regexec.c", line 1394, function: pop_fail_stack Aborted (core dumped)|. Sorry, I can't copy the core off that machine for corporate reasons. Those git-bash tests were using |$ echo $BASH_VERSION| |5.2.15(1)-release ||$ grep --version||grep (GNU grep) 3.0|
|Regards, Ed Morton |


reply via email to

[Prev in Thread] Current Thread [Next in Thread]