bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19242: latest grep considers text files as binary


From: Eric Blake
Subject: bug#19242: latest grep considers text files as binary
Date: Fri, 05 Dec 2014 08:34:55 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 12/05/2014 02:58 AM, Thomas Wolff wrote:
> Paul Eggert wrote:
>>> the mentioned patches are apparently intended to fix issues in
>>> non-UTF-8 locales.
>> No, they're also needed for UTF-8 locales I'm afraid.  There are some
>> security issues, not only having to do with grep's internals, but also
>> for the behavior of downstream programs that may be expecting UTF-8 text.
>>
>> You can work around the problem with 'grep -a'.
> I was aware of this workaround but I claim it should not be needed
> because the files affected are in fact not binary files but text files.

No, they are binary.  The POSIX definition of a text file states that
the file may consist ONLY of characters in the current locale.  If you
have files created under different locales, such that the bytes in the
file are NOT characters in the current locale, then that file is binary
under the current locale, even though it may be text in a better locale.

> The manual clearly says about -a: "Process a binary file as if it were
> text" but partial content in a different text encoding does not make a
> file binary.

Yes, it does, per POSIX.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]