[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#22655: grep-2.21 (and git master): --null-data and ranges work in an
From: |
Ulya Fokanova |
Subject: |
bug#22655: grep-2.21 (and git master): --null-data and ranges work in an odd way (-P works fine) |
Date: |
Sun, 14 Feb 2016 20:02:13 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
I've explored the following case:
$ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z '^[1-4]*$' | wc -c
6
It's a bug (there should be no match).
This is what grep does:
* triesto build DFA (as indfa.c)
* fails to expand character range [1-4] because of multibyte
localeen_US.utf-8 and gives up building DFA(marks [1-4] as BACKREF
that suppressesall dfa.c-related code), note the difference with
[1234] casein whichthere's no need to expand multibyte range
* falls back to Regex (gnulib extension of regex.h)
* Regex doesn't support '-z'semantics(the closest configuration to
'-z' is RE_NEWLINE_ALT, which is already included in RE_SYNTAX_GREP
set), so '\n'is treated as newline and match erroneously succeeds
I think this should be worked around in grep: before calling 're_search'
it should split the input string by 'eolbyte'.
The bug also present with PCRE engine:
$ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z -P '^[1234]*$' | wc -c
6
$ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z -P '^[1-4]*$' | wc -c
6
Ulya