--- Begin Message ---
Subject: |
checking for a binary file is not deterministic |
Date: |
Fri, 25 Sep 2015 11:11:06 +0200 |
Hi,
When piping a certain diff into grep-2.21, it sometimes thinks
it is a binary file, and sometimes treats it as text. The latter
behaviour is expected and desired. I think grep should never
consider standard input to be binary.
For lack of a simple recipe, here is the actual use case:
wget
http://http.debian.net/debian/pool/main/g/gtkorphan/gtkorphan_0.4.4.orig.tar.gz
tar -xf gtkorphan_0.4.4.orig.tar.gz
cd gtkorphan-0.4.4/
mkdir fresh
# the command rsync does not work at this location:
for lang in pt_BR bg zh_CN hr cs da nl eo fi fr de hu id it lv pl ru sr sv
vi; do \
wget http://translationproject.org/PO-files/$lang/gtkorphan-0.4.3.$lang.po
-O fresh/$lang.po; \
done
diff -ur po fresh | /usr/local/bin/grep "Only in" | grep "fi"
That last command sometimes outputs:
Only in fresh: fi.po
Only in po: Makefile.in.in
and sometimes:
Binary file (standard input) matches
(If you can't get the second output, try hitting Enter a few times
and then running the command again, and again, and again. If you
still can't get both outputs, try using the en_US.utf8 locale.)
What seems to happening is that sometimes grep will look
far enough to see the diff between po/fr.po and fresh/fr.po
(which contains some ISO8859-1 codes), and sometimes
not. When deleting fresh/bg.po and fresh/de.po, grep will
always see those codes and will always consider the input
to be binary.
I can of course use -a to force grep to see standard input
as text, but still... I think the determining whether a file
is text or binary should be deterministic: it should always
yield the same result when the input is the same.
$ /usr/local/bin/grep --version | head -1
/usr/local/bin/grep (GNU grep) 2.21
$ grep --version | head -1
grep (GNU grep) 2.21
$ diff --version | head -1
diff (GNU diffutils) 2.8.1
$ locale
LANG=eo.utf8
LANGUAGE=en
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=
Benno
--
http://www.fastmail.com - Accessible with your email software
or over the web
--- End Message ---