bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29668: grep: Fatal problem with (big) file


From: Norihiro Tanaka
Subject: bug#29668: grep: Fatal problem with (big) file
Date: Sat, 16 Dec 2017 09:25:59 +0900

On Wed, 13 Dec 2017 16:03:57 -0800
Paul Eggert <address@hidden> wrote:

> On 12/13/2017 03:25 PM, Norihiro Tanaka wrote:
> > I don't seem that that's problem.  the user pass output of grep to wc -l,
> > so `Binary file ... matches' line is also counted by `wc' as one line.
> 
> The intent of 'grep PATTERN | wc -l' is to count the number of matches, like 
> 'grep -c PATTERN' would. But it doesn't work that way here. E.g., on Fedora 
> 27 with LANG=en_US.UTF-8:
> 
> $ grep -c Volvo Tieliikenne5.0.csv
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> 241264
> $ grep Volvo Tieliikenne5.0.csv | tail -n 1
> Binary file Tieliikenne5.0.csv matches
> 
> If the "Binary file ... matches" line were sent to stdout instead of to 
> stderr, the problem would be more obvious to the user:
> 
> $ grep -c Volvo Tieliikenne5.0.csv
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> Binary file Tieliikenne5.0.csv matches
> 241264
> $ grep Volvo Tieliikenne5.0.csv | tail -n 1
> Binary file Tieliikenne5.0.csv matches
> T;2017-09-29;75;01;;;19550000;;;;;1;1570;;3000;2595;1670;;01;2200;20.6;4;false;false;Volvo;;;;;01;;01;977;;;841;;5092946
> 
> I believe that in the past I've thought that the "Binary file" message should 
> be sent to stdout, but these examples are a reasonably compelling reason to 
> send them to stderr instead.

In addition, the following problem can also occur.

$ printf 'Binary file a.txt matches\n' >a.txt
$ env LC_ALL=en_US.utf8 grep B a.txt
Binary file a.txt matches

$ printf '\xFFB\n' >a.txt
$ env LC_ALL=en_US.utf8 grep B a.txt
Binary file a.txt matches

Both are same output.  However, the former displays the contents of the
matched line, OTOH the latter is not so.  if "Binary file" is sent to stdout,
a user can not distinguish whether a.txt is text file or a binary file
without opening the file.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]