bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23763: Bug report: Grep stops, if a text file contains a null charac


From: Bjoern Voigt
Subject: bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes
Date: Tue, 14 Jun 2016 22:10:27 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40

Paul Eggert wrote:
> Bjoern Voigt wrote:
>> This is clearly a bug in my eyes.
>
> The behavior conforms to grep's spec, so it's not a bug in that sense.
> I don't offhand see a behavior change that wouldn't cause worse
> problems elsewhere. Unless you were thinking of adding an option?
The current manual page patched with
"0001-doc-remove-obsolete-MS-DOS-mention-2.patch" says:

--binary-files=TYPE
  If the first few bytes of a file indicate that the file
  contains binary data, assume that the file is of type TYPE.  By
  default, TYPE is binary, and grep normally outputs either a
  one-line message saying that a binary file matches, or no
  message if there is no match.  If TYPE is without-match, grep
  assumes that a binary file does not match; this is equivalent
  to the -I option.  If TYPE is text, grep processes a binary
  file as if it were text; this is equivalent to the -a option.
  When processing binary data, grep may treat non-text bytes as
  line terminators; for example, the pattern '.'
  (period) might not match a null byte, as the null byte might be
  treated as a line terminator.  Warning: grep
  --binary-files=text might output binary garbage, which can have
  nasty side effects if the output is a terminal and if the
  terminal driver interprets some of it as commands.

My test case where a files starts with more than 32KB text data and
continues with text data with at least one embedded 0 character (which
makes this binary data) is undocumented.

Consequently I probably search a new option "--binary-files=auto" which
also should by the default sometime later.

For files it should work as follows:

--binary-files=auto
If the first few bytes of a file indicate that the file
contains binary data, assume that the file is of type binary.
Otherwise assume that the file is of type text.

Since the behavior of --binary-files=binary for my testcase is
undocumented and since the output is more or less useless except of the
fact that some not-printable characters on terminal are suppressed, it
would be also an option to change --binary-files=binary mode in code and
in the manual page.

For files as input data this is easy to implement. But I haven't
checked, how --binary-files should work with standard input. The
decision binary or text should be made there before the first match is
printed.

My MySQL mysqldump problem can be solved with --text or
--binary-files=text. So I do not search a quick solution anymore.

Regards,
Björn







reply via email to

[Prev in Thread] Current Thread [Next in Thread]