bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23234: unexpected results with charset handling in GNU grep 2.23


From: Paul Eggert
Subject: bug#23234: unexpected results with charset handling in GNU grep 2.23
Date: Wed, 6 Apr 2016 18:28:33 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1

On 04/06/2016 02:04 PM, Eric Blake wrote:
POSIX ... says that LC_ALL=C is _required_ to treat all 256 byte values as valid characters

Although that was the intent of POSIX, it's not what the current standard says, and it's not what many popular platforms do. Problematic platforms include Fedora 23, where mbrtowc reports an encoding error in the C locale when given a byte outside the range 0-127. This affects many programs other than 'grep'.

This bug in the standard is intended to be fixed in a future version of POSIX (see <http://austingroupbugs.net/view.php?id=663#c2738>). I suppose glibc and eventually Fedora will be fixed to conform to the new standard in due course.

Perhaps grep should work around this problem on systems like Fedora 23 where the underlying C library does not conform to the next version of POSIX. It sounds like a new gnulib module or two might do the trick. This should fix the problems that Björn mentions.

In the meantime grep -a is the way to go. Yes, it's not portable to non-GNU grep, but there is no portable solution given the abovementioned POSIX problems, so a GNU-grep-only workaround is all one can reasonably ask for.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]