[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23763: Bug report: Grep stops, if a text file contains a null charac
From: |
Bjoern Voigt |
Subject: |
bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes |
Date: |
Mon, 13 Jun 2016 22:52:38 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 |
Eric Blake wrote:
> POSIX allows this behavior, in that it says that grep's behavior is
> undefined on non-text files (which you have by virtue of your NUL
> byte). Since this is documented behavior of GNU grep when -a is not
> used, I'm closing this as not a bug. But feel free to add further
> comments to this thread.
If I start grep with the "-a" option or "--binary=text", the bug does
not show up.
"grep --binary-files=binary" which is the default shows the bug.
I am relatively sure, that the auto guessing code is incorrect or
limited, if a null character is found after 32KB. The manual page says
about the auto guessing code:
-U, --binary
Treat the file(s) as binary. By default, under MS-DOS
and MS-
Windows, grep guesses the file type by looking at the
contents
of the first 32KB read from the file. If grep decides
the file
is a text file, it strips the CR characters from the
original
file contents (to make regular expressions with ^ and
$ work
correctly). Specifying -U overrules this guesswork,
causing all
files to be read and passed to the matching mechanism
verbatim;
if the file is a text file with CR/LF pairs at the end
of each
line, this will cause some regular expressions to
fail. This
option has no effect on platforms other than MS-DOS
and MS-
Windows.
I see these problems:
1. The binary mode is implemented inconsistent. It would be acceptable,
if grep produces none (no match, exit code >0) or exactly one output
line ("Binary file testfile.txt matches", exit code 0). It is not
acceptable, that grep writes some matching text lines and later
"Binary file testfile.txt matches" and exits with code 0.
2. Linux or more precisely None-MS-DOS and None-MS-Windows users will
oversee the auto guessing section in manual page, because of the
notes "By default, under MS-DOS and MS-Windows, grep guesses the
file type by looking at the contents of the first 32KB read from
the file." and "This option has no effect on platforms other than
MS-DOS and MS-Windows."
3. The auto-guessing mechanism is not documented somewhere else in the
documentation.
4. The auto guessing limitations are somehow documented in the manual
page, but not in the BUGS section.
5. The exit code should not be 0, if grep founds an error in input
which it can't recover.
6. The error message "Binary file testfile.txt matches" must not be
written on standard output, if matching text lines are written before.
7. POSIX defines minimal assurances for grep. Of course GNU grep can or
should be better.
8. Other implementations (like the tested FreeBSD version) do not show
the bug. Also busybox works correctly.
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Bjoern Voigt, 2016/06/13
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Eric Blake, 2016/06/13
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes,
Bjoern Voigt <=
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Paul Eggert, 2016/06/13
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Bjoern Voigt, 2016/06/14
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Paul Eggert, 2016/06/14
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Bjoern Voigt, 2016/06/14
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Paul Eggert, 2016/06/15
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, sur-behoffski, 2016/06/15
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Paul Eggert, 2016/06/15
- bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes, Eric Blake, 2016/06/15