bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-grep] Re: grep: very large file with no newline causes trouble


From: Jim Meyering
Subject: [bug-grep] Re: grep: very large file with no newline causes trouble
Date: Sat, 20 Nov 2004 13:32:26 +0100

Hello,

I sent this patch to bug-grep a year and a half ago,
and it's been fixed in Debian's grep for over a year
  http://bugs.debian.org/185208
Is there something about it that you don't like?

Here's some motivation:

grep doesn't deal well with very large files containing no line terminator.
I ran grep in a directory where I thought it'd find matches and complete
in a fraction of a second.  I was surprised to see it appear to hang and
finally exit with only this error: `grep: memory exhausted'.  Not only did
it fail to tell me which file caused the problem, but it stopped immediately
rather than continuing on with the remaining files.  The failure was due
to the presence of a file I'd created like this:

  $ dd bs=1 seek=1T of=big < /dev/null

Test the patched version like this:

  $ dd bs=1 seek=1G of=big < /dev/null
  $ (ulimit -v 10000; echo a|./grep a big -)
  grep: big: Cannot allocate memory
  (standard input):a
  [Exit 2]

Before the patch, it would stop with the ENOMEM error and fail
to print the match from the second `file'.

It's still *possible* that the patched code will exit
immediately upon a different memory allocation failure:
once the presumably-large `buffer' is freed,
if the allocation (via xmalloc) of a relatively small (32k)
buffer fails, you lose.  IMHO, that's very unlikely to happen,
and not worth worrying about.

Here's the patch.
The rearrangement in main is an improvement, but not strictly necessary.

2003-04-03  Jim Meyering  <address@hidden>

        When grep runs out of memory, don't abort the entire command,
        but rather just the affected command line argument(s).
        * src/grep.c (stdin_argv): New global.
        (fillbuf): Use malloc, not xmalloc, and handle malloc failure.
        (main): Rearrange main loop so that there is only one grepfile call.

Index: src/grep.c
===================================================================
RCS file: /cvsroot/grep/grep/src/grep.c,v
retrieving revision 1.82
diff -u -p -r1.82 grep.c
--- src/grep.c  19 Nov 2004 13:51:29 -0000      1.82
+++ src/grep.c  20 Nov 2004 10:55:03 -0000
@@ -1,5 +1,5 @@
 /* grep.c - main driver file for grep.
-   Copyright 1992, 1997-1999, 2000 Free Software Foundation, Inc.
+   Copyright 1992, 1997-1999, 2000, 2004 Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -82,6 +82,12 @@ static struct exclude *included_patterns
 static char const short_options[] =
 "0123456789A:B:C:D:EFGHIPUVX:abcd:e:f:hiKLlm:noqRrsuvwxyZz";
 
+/* Default for `file_list' if no files are given on the command line. */
+static char *stdin_argv[] =
+{
+  "-", NULL
+};
+
 /* Non-boolean long options that have no corresponding short equivalents.  */
 enum
 {
@@ -348,7 +354,16 @@ fillbuf (size_t save, struct stats const
         for byte sentinels fore and aft.  */
       newalloc = newsize + pagesize + 1;
 
-      newbuf = bufalloc < newalloc ? xmalloc (bufalloc = newalloc) : buffer;
+      newbuf = bufalloc < newalloc ? malloc (bufalloc = newalloc) : buffer;
+      if (newbuf == NULL)
+       {
+         int saved_errno = errno;
+         free (buffer);
+         bufalloc = ALIGN_TO (INITIAL_BUFSIZE, pagesize) + pagesize + 1;
+         buffer = xmalloc (bufalloc);
+         errno = saved_errno;
+         return 0;
+       }
       readbuf = ALIGN_TO (newbuf + 1 + save, pagesize);
       bufbeg = readbuf - save;
       memmove (bufbeg, buffer + saved_offset, save);
@@ -1321,6 +1336,7 @@ main (int argc, char **argv)
   FILE *fp;
   extern char *optarg;
   extern int optind;
+  char **file_list;
 
   initialize_main (&argc, &argv);
   program_name = argv[0];
@@ -1745,29 +1761,29 @@ warranty; not even for MERCHANTABILITY o
   if (max_count == 0)
     exit (1);
 
-  if (optind < argc)
+  file_list = (optind == argc ? stdin_argv : &argv[optind]);
+
+  status = 1;
+  while (1)
     {
-       status = 1;
-       do
+      char *file = *file_list++;
+
+      if (file == NULL)
+       break;
+
+      if ((included_patterns || excluded_patterns)
+         && !isdir (file))
        {
-         char *file = argv[optind];
-         if ((included_patterns || excluded_patterns)
-             && !isdir (file))
-           {
-             if (included_patterns &&
-                 ! excluded_filename (included_patterns, file, 0))
-               continue;
-             if (excluded_patterns &&
-                 excluded_filename (excluded_patterns, file, 0))
-               continue;
-           }
-         status &= grepfile (strcmp (file, "-") == 0 ? (char *) NULL : file,
-                             &stats_base);
+         if (included_patterns &&
+             ! excluded_filename (included_patterns, file, 0))
+           continue;
+         if (excluded_patterns &&
+             excluded_filename (excluded_patterns, file, 0))
+           continue;
        }
-       while ( ++optind < argc);
+      status &= grepfile (strcmp (file, "-") == 0
+                         ? (char *) NULL : file, &stats_base);
     }
-  else
-    status = grepfile ((char *) NULL, &stats_base);
 
   /* We register via atexit() to test stdout.  */
   exit (errseen ? 2 : status);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]