bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Misidentifying binary files as text


From: Joseph Garvin
Subject: Misidentifying binary files as text
Date: Tue, 1 Jun 2010 09:30:11 -0500

Global frequently misidentifies binary files as text, so I'm spammed with warnings like the following:

Warning: 'apps/fff/fool.pdf' ignored, because it includes blank.
Warning: 'apps/fff/bar.pdf' ignored, because it includes blank.

I've written such a detector before and I've found reading in the first page of a file and checking if 30% of the characters are outside the 32-127 ASCII range (the text characters) as a good litmus test for whether a file is binary. I have a python script that I use to wrap cat to prevent me from accidentally catting binaries implementing this test, which I've attached for reference. It correctly identifies all the files I receive spam about as binary; I'm not sure what sort of test global currently uses.

I've also attached one of the files I receive the warning on, UnnamedPackage.dependency. Please let me know if there's any other info that would be good for helping fix the bug and thanks for the great tool! :)

Joe

Attachment: safecat.py
Description: Binary data

Attachment: UnnamedPackage.dependency
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]