[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Misidentifying binary files as text
From: |
Joseph Garvin |
Subject: |
Misidentifying binary files as text |
Date: |
Tue, 1 Jun 2010 09:30:11 -0500 |
Global frequently misidentifies binary files as text, so I'm spammed with warnings like the following:
Warning: 'apps/fff/fool.pdf' ignored, because it includes blank.
Warning: 'apps/fff/bar.pdf' ignored, because it includes blank.
I've written such a detector before and I've found reading in the first page of a file and checking if 30% of the characters are outside the 32-127 ASCII range (the text characters) as a good litmus test for whether a file is binary. I have a python script that I use to wrap cat to prevent me from accidentally catting binaries implementing this test, which I've attached for reference. It correctly identifies all the files I receive spam about as binary; I'm not sure what sort of test global currently uses.
I've also attached one of the files I receive the warning on, UnnamedPackage.dependency. Please let me know if there's any other info that would be good for helping fix the bug and thanks for the great tool! :)
Joe
safecat.py
Description: Binary data
UnnamedPackage.dependency
Description: Binary data
- Misidentifying binary files as text,
Joseph Garvin <=