bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep: 'binary files' where matches are text


From: Ed Avis
Subject: Re: grep: 'binary files' where matches are text
Date: Sat, 21 Jun 2003 20:01:17 +0100 (BST)

Ed Avis wrote:

>My feeling is that grep's job is to look for and print matches in
>all files specified, and it should try to stick to that.  To avoid
>unpleasantness on the user's terminal grep can think twice before
>printing binary garbage, but this necessary evil should interfere as
>little as possible with the job of printing matches.  So better to
>print as much as possible, and only give the 'binary match found'
>message when it is really necessary.

>>But I don't know when I get to it.  Are you willing to donate a patch?

Here's the patch promised.  It adds a new option --binary-files=try
which will try to find and print matches even in binary files, but if
the matched text itself is binary then grep just prints 'Binary match
found' rather than the matched text.  But it continues looking through
the file for text matches.

The suppression of printing a match happens also when the context crud
contains binary data, I didn't think it was worth doing something more
complex like trying to print the match itself without context.  If you
ask for a context grep, you either get context matches or 'Binary
match found'.

diff -ru grep-2.5.1/doc/grep.1 grep-2.5.1-new/doc/grep.1
--- grep-2.5.1/doc/grep.1       2002-01-22 13:20:04.000000000 +0000
+++ grep-2.5.1-new/doc/grep.1   2003-06-21 19:55:55.000000000 +0100
@@ -93,9 +93,7 @@
 .TP
 .BI \-\^\-binary-files= TYPE
 If the first few bytes of a file indicate that the file contains binary
-data, assume that the file is of type
-.IR TYPE .
-By default,
+data, this option controls how to handle it.  By default,
 .I TYPE
 is
 .BR binary ,
@@ -120,6 +118,15 @@
 processes a binary file as if it were text; this is equivalent to the
 .B \-a
 option.
+If
+.I TYPE
+is
+.BR try ,
+then
+.B grep
+will try processing the file as text and any textual matches will be
+reported, but if there are matches containing binary data this will be
+reported as a single one-line message.
 .I Warning:
 .B "grep \-\^\-binary-files=text"
 might output binary garbage,
diff -ru grep-2.5.1/doc/grep.texi grep-2.5.1-new/doc/grep.texi
--- grep-2.5.1/doc/grep.texi    2002-01-22 13:20:04.000000000 +0000
+++ grep-2.5.1-new/doc/grep.texi        2003-06-21 19:43:52.000000000 +0100
@@ -292,14 +292,18 @@
 @opindex --binary-files
 @cindex binary files
 If the first few bytes of a file indicate that the file contains binary
-data, assume that the file is of type @var{type}.  By default,
address@hidden is @samp{binary}, and @command{grep} normally outputs either
-a one-line message saying that a binary file matches, or no message if
-there is no match.  If @var{type} is @samp{without-match},
address@hidden assumes that a binary file does not match;
-this is equivalent to the @samp{-I} option.  If @var{type}
-is @samp{text}, @command{grep} processes a binary file as if it were
-text; this is equivalent to the @samp{-a} option.
+data, this option controls how to handle it.  By default, @var{type} is
address@hidden, and @command{grep} normally outputs either a one-line
+message saying that a binary file matches, or no message if there is no
+match.  If @var{type} is @samp{without-match}, @command{grep} assumes
+that a binary file does not match; this is equivalent to the @samp{-I}
+option.  If @var{type} is @samp{text}, @command{grep} processes a binary
+file as if it were text; this is equivalent to the @samp{-a} option.  If
address@hidden is @samp{try}, then @command{grep} will try processing the
+file as text and any textual matches will be reported, but if there are
+matches containing binary data this will be reported as a single
+one-line message.
+
 @emph{Warning:} @samp{--binary-files=text} might output binary garbage,
 which can have nasty side effects if the output is a terminal and if the
 terminal driver interprets some of it as commands.
@@ -1139,10 +1143,12 @@
 If @command{grep} listed all matching ``lines'' from a binary file, it
 would probably generate output that is not useful, and it might even
 muck up your display.  So @sc{gnu} @command{grep} suppresses output from
-files that appear to be binary files.  To force @sc{gnu} @command{grep}
-to output lines even from files that appear to be binary, use the
address@hidden or @samp{--binary-files=text} option.  To eliminate the
-``Binary file matches'' messages, use the @samp{-I} or
+files that appear to be binary files.  To make @sc{gnu} @command{grep}
+output any text matches it finds even if other parts of the file are
+binary, use the @samp{--binary-files=try} option.  To force @sc{gnu}
address@hidden to output all matches even from files that appear to be
+binary, use the @samp{-a} or @samp{--binary-files=text} option.  To
+eliminate the ``Binary file matches'' messages, use the @samp{-I} or
 @samp{--binary-files=without-match} option.
 
 @item
diff -ru grep-2.5.1/src/grep.c grep-2.5.1-new/src/grep.c
--- grep-2.5.1/src/grep.c       2002-03-26 15:54:12.000000000 +0000
+++ grep-2.5.1-new/src/grep.c   2003-06-21 19:53:57.000000000 +0100
@@ -158,6 +158,13 @@
 static char const *filename;
 static int errseen;
 
+/* Whether the current file looks like binary data (not plain text). */
+static int file_is_binary;
+
+/* Whether (in the case of --binary-files=try) we have already
+   reported a binary match in the current file. */
+static int reported_binary_match;
+
 /* How to handle directories.  */
 static enum
   {
@@ -431,7 +438,8 @@
 {
   BINARY_BINARY_FILES,
   TEXT_BINARY_FILES,
-  WITHOUT_MATCH_BINARY_FILES
+  WITHOUT_MATCH_BINARY_FILES,
+  TRY_BINARY_FILES
 } binary_files;                /* How to handle binary files.  */
 
 static int filename_mask;      /* If zero, output nulls after filenames.  */
@@ -631,6 +639,14 @@
     }
 }
 
+/* Return whether a section of memory 'looks like' binary (as opposed
+   to text) data. */
+static int
+looks_binary (char const *start, size_t length)
+{
+  return memchr (start, eolbyte ? '\0' : '\200', length) != 0;
+}
+
 /* Print the lines between BEG and LIM.  Deal with context crap.
    If NLINESP is non-null, store a count of lines between BEG and LIM.  */
 static void
@@ -640,13 +656,23 @@
   char const *bp, *p;
   char eol = eolbyte;
   int i, n;
+  int quiet = out_quiet;
 
   if (!out_quiet && pending > 0)
     prpending (beg);
 
   p = beg;
 
-  if (!out_quiet)
+  if (file_is_binary
+      && binary_files != TEXT_BINARY_FILES
+      && looks_binary(beg, lim - beg))
+    {
+      if (! quiet && ! reported_binary_match++)
+       printf (_("Binary match in file %s\n"), filename);
+      quiet = 1;
+    }
+
+  if (! quiet)
     {
       /* Deal with leading context crap. */
 
@@ -678,7 +704,7 @@
        {
          char const *nl = memchr (p, eol, lim - p);
          nl++;
-         if (!out_quiet)
+         if (! quiet)
            prline (p, nl, ':');
          p = nl;
        }
@@ -688,10 +714,10 @@
       after_last_match = bufoffset - (buflim - p);
     }
   else
-    if (!out_quiet)
+    if (! quiet)
       prline (beg, lim, ':');
 
-  pending = out_quiet ? 0 : out_after;
+  pending = quiet ? 0 : out_after;
   used = 1;
 }
 
@@ -760,6 +786,8 @@
   char *beg;
   char *lim;
   char eol = eolbyte;
+  int old_done_on_match = done_on_match; /* boolean */
+  int old_out_quiet = out_quiet;         /* boolean */
 
   if (!reset (fd, file, stats))
     return 0;
@@ -792,13 +820,30 @@
       return 0;
     }
 
-  not_text = (((binary_files == BINARY_BINARY_FILES && !out_quiet)
-              || binary_files == WITHOUT_MATCH_BINARY_FILES)
-             && memchr (bufbeg, eol ? '\0' : '\200', buflim - bufbeg));
-  if (not_text && binary_files == WITHOUT_MATCH_BINARY_FILES)
-    return 0;
-  done_on_match += not_text;
-  out_quiet += not_text;
+  file_is_binary = looks_binary(bufbeg, buflim - bufbeg);
+  if (file_is_binary)
+    {
+      if (binary_files == WITHOUT_MATCH_BINARY_FILES)
+       /* Assume the file does not match. */
+       return 0;
+      else if (binary_files == BINARY_BINARY_FILES)
+       {
+         /* Don't print matches but maybe do print whether any matched. */
+         if (out_quiet)
+           /* Assume we need to count matches. */
+           ;
+         else
+           {
+             /* Turn off printing individual matches; once we've seen
+                one we know we must print the 'file matched' message. */
+             done_on_match = 1;
+             out_quiet = 1;
+           }
+       }
+      /* Otherwise, process the file normally; what to do with the
+        matches found will also be controlled by the binary_files
+        option. */
+    }
 
   for (;;)
     {
@@ -874,9 +919,13 @@
     }
 
  finish_grep:
-  done_on_match -= not_text;
-  out_quiet -= not_text;
-  if ((not_text & ~out_quiet) && nlines != 0)
+  done_on_match = old_done_on_match;
+  out_quiet = old_out_quiet;
+
+  if (binary_files == BINARY_BINARY_FILES
+      && file_is_binary
+      && ! out_quiet
+      && nlines != 0)
     printf (_("Binary file %s matches\n"), filename);
   return nlines;
 }
@@ -945,6 +994,8 @@
     SET_BINARY (desc);
 #endif
 
+  reported_binary_match = 0;
+
   count = grep (desc, file, stats);
   if (count < 0)
     status = count + 2;
@@ -1090,8 +1141,8 @@
       --label=LABEL         print LABEL as filename for standard input\n\
   -o, --only-matching       show only the part of a line matching PATTERN\n\
   -q, --quiet, --silent     suppress all normal output\n\
-      --binary-files=TYPE   assume that binary files are TYPE\n\
-                            TYPE is 'binary', 'text', or 'without-match'\n\
+      --binary-files=TYPE   handle binary files as TYPE\n\
+                            one of 'binary', 'text', 'without-match', 'try'\n\
   -a, --text                equivalent to --binary-files=text\n\
   -I                        equivalent to --binary-files=without-match\n\
   -d, --directories=ACTION  how to handle directories\n\
@@ -1563,6 +1614,8 @@
          binary_files = TEXT_BINARY_FILES;
        else if (strcmp (optarg, "without-match") == 0)
          binary_files = WITHOUT_MATCH_BINARY_FILES;
+       else if (strcmp (optarg, "try") == 0)
+         binary_files = TRY_BINARY_FILES;
        else
          error (2, 0, _("unknown binary-files type"));
        break;


-- 
Ed Avis <address@hidden>






reply via email to

[Prev in Thread] Current Thread [Next in Thread]