bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22838: New 'Binary file' detection considered harmful


From: Eric Blake
Subject: bug#22838: New 'Binary file' detection considered harmful
Date: Mon, 29 Feb 2016 16:55:14 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

On 02/29/2016 04:35 PM, Paul Eggert wrote:

> I suggest using -a. LC_ALL=C won't work the way that you want on
> platforms where the C locale is UTF-8, or is pure ASCII. For example, on
> Fedora 23 or RHEL 7 with grep 2.23 we have:
> 
> $ printf '\200\n' | LC_ALL=C grep .
> Binary file (standard input) matches
> 
> This is because the C locale is pure ASCII on these platforms, i.e.,
> '\200' is not a valid character the way it is with traditional Unix.  I
> don't know why Red Hat made that change.

I _think_ the Austin Group is leaning towards requiring the "C" locale
to always be a unibyte locale with all 256 bytes as valid characters, so
neither strict 7-bit ASCII nor UTF-8 would be usable as the "C" locale;
but for that to happen, POSIX would also need to allow a way to get a
UTF-8 locale easily accessible and describe how it differs from the "C"
locale under such a ruling.  But it's still all conjecture on what the
final results will be - even in the standards committee, gracefully
documenting how locale corner cases must behave vs. leaving
implementations some latitude is tricky business; and any such change is
at least 3 or 4 years down the road before it could be standardized in
Issue 8 (right now, the focus is on Technical Corrigendum 2 for Issue 7).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]