[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#37634: Non-charset characters are not recognized.
From: |
sur3 |
Subject: |
bug#37634: Non-charset characters are not recognized. |
Date: |
Sat, 5 Oct 2019 16:26:28 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
sed doesn't recognize non-charset characters as . (dot), eg with
LC_CTYPE=en_US.UTF-8:
# printf "ABCD\n" | sed 's/B.*C//'
Output: AD
# printf "AB\x8eCD\n" | sed 's/B.*C//'
Output: AB�CD
I also tried something like [^E]* instead of .* but that also does not work.
I think sed should recognize \x8e is not a C or newline even though it's
not in the character set.
With
# printf "AB\x8eCD\n" | LC_CTYPE=C sed 's/B.*C//'
Output: AD
it works but, that's a bit non-intuitive, because normally one wants to
have UTF8-charset and sed to function correctly anyway or is there an
other regex similar to . that can recognize such characters?
- bug#37634: Non-charset characters are not recognized.,
sur3 <=