bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Misidentifying binary files as text


From: Shigio YAMAGUCHI
Subject: Re: Misidentifying binary files as text
Date: Thu, 03 Jun 2010 09:35:48 +0900

Hi,
> Global frequently misidentifies binary files as text, so I'm spammed with
> warnings like the following:
> 
> Warning: 'apps/fff/fool.pdf' ignored, because it includes blank.
> Warning: 'apps/fff/bar.pdf' ignored, because it includes blank.

This warning message means that the path name includes blanks.
(You are sure to be using directories which has blanks in its name.)
Currently, GLOBAL cannot treat such files.

> I've written such a detector before and I've found reading in the first page
> of a file and checking if 30% of the characters are outside the 32-127 ASCII
> range (the text characters) as a good litmus test for whether a file is
> binary. I have a python script that I use to wrap cat to prevent me from
> accidentally catting binaries implementing this test, which I've attached
> for reference. It correctly identifies all the files I receive spam about as
> binary; I'm not sure what sort of test global currently uses.

Recently (version 5.8.2), we changed the testing method. Though I thought
that the new method will work well, there seems to be problems in it.
A new method might cause another problem again.

To avoid the same mistake, I would like to do like follows?
1. Revives the old testing method which was used until version 5.8.1.
2. Make the New (current) method effective only when you set environment
   variable GTAGSTESTBINARY.
3. The shift to a new method is done after an enough test in the community.

What do you think?
--
Shigio YAMAGUCHI <address@hidden>
PGP fingerprint: D1CB 0B89 B346 4AB6 5663  C4B6 3CA5 BBB3 57BE DDA3



reply via email to

[Prev in Thread] Current Thread [Next in Thread]