--- Begin Message ---
Subject: |
Bug report: Grep stops, if a text file contains a null character after 32768 bytes |
Date: |
Mon, 13 Jun 2016 21:45:30 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 |
Grep shows a bug, if it processes a text file with at least one embedded
0 (ASCII zero) character after byte 32768. Grep stops with the error
message "Binary file testfile.txt matches" and exit code 0. The error
message is written to standard output. Any line after the 0 character is
silently ignored in output.
Environment:
- grep-2.25
- no patches, no "configure" options
- openSUSE Tumbleweed 20160611 x86_64; glibc 2.23; libpcre 8.38
I saw this bug first, as I tried to filter out a line of the MySQL
backup utility "mysqldump". Because grep stopped at the 0 character, the
backups where incomplete.
# mysqldump --all-databases | grep -v '^-- Dump completed on'
[... around 240 lines of SQL output ...]
LOCK TABLES `PartTable` WRITE;
/*!40000 ALTER TABLE `PartTable` DISABLE KEYS */;
Binary file (standard input) matches
mysqldump: Got errno 32 on write
I found that the mysqldump output contains 0 characters in table PartTable.
I wrote the following test script, which shows the bug without a
dependency to MySQL:
--------------------------------------------------------
#!/bin/bash
testfile="testfile.txt"
# write a text file large enough (16384 lines is
# the minimum number for this test case)
for((i=1;i<=16384;i++)) do echo "A"; done > $testfile
# write a zero byte
echo -e '\0' >> $testfile
# write an end line
echo -e 'A ... the end' >> $testfile
# verify the file contents
ls -l $testfile
tail -n 10 $testfile
# use 'grep' to find all lines with the string "A"
grep "A" $testfile
# the last line is missing, the output ends with
# "Binary file testfile.txt matches"
# check the exit code
echo "Exit code of grep:" $?
--------------------------------------------------------
The last line "A ... the end" is missing in output of grep. The exit
code is 0:
# ./null-bug-testcase.txt
[...]
A
A
A
Binary file testfile.txt matches
Exit code of grep: 0
I also found this bug in older grep versions (e.g. Ubuntu 14.04; grep 2.16).
FreeBSD's version of grep (tested with 2.5.1-FreeBSD under FreeBSD
10.3-RELEASE-p4) does not show the bug:
#./null-bug-testcase.txt
[...]
A
A
A
A ... the end
Exit code of grep: 0
Regards,
Björn
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#23763: Bug report: Grep stops, if a text file contains a null character after 32768 bytes |
Date: |
Mon, 13 Jun 2016 14:01:28 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
tag 23763 notabug
thanks
On 06/13/2016 01:45 PM, Bjoern Voigt wrote:
> Grep shows a bug, if it processes a text file with at least one embedded
> 0 (ASCII zero) character after byte 32768.
Thanks for the report. However, this is not a bug in grep, but
documented behavior. By definition, a text file CANNOT contain NUL
bytes; any file with NUL characters is a binary file. You can still
make grep process it as a text file, but only with the '-a' flag.
> Grep stops with the error
> message "Binary file testfile.txt matches" and exit code 0. The error
> message is written to standard output. Any line after the 0 character is
> silently ignored in output.
POSIX allows this behavior, in that it says that grep's behavior is
undefined on non-text files (which you have by virtue of your NUL byte).
Since this is documented behavior of GNU grep when -a is not used, I'm
closing this as not a bug. But feel free to add further comments to this
thread.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
--- End Message ---