bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] ls - fails in a directory with >6M files


From: Jim Meyering
Subject: Re: [PATCH] ls - fails in a directory with >6M files
Date: Wed, 30 Jul 2008 15:40:46 +0200

Kamil Dudka <address@hidden> wrote:
> as solution to rhbz #441807 (/bin/ls fails in a directory with >6M files)

Thanks for the pointer and patch.

Normally I prefer to credit the bug reporter and mention
any bug-tracking number, but this one is private.
Does it need to remain that way?

> I propose tiny patch for ls - work with constant amount of memory when not
> sorting and using one_per_line format, no matter how many files are in the
> directory.
>
> Tested with directory of about 10000 files.
>
> Git version:
> $ ulimit -v 102400
> $ ls -U1 | wc -l
> ls: memory exhausted
> 0
>
> Patched version:
> $ ulimit -v 102400
> $ ls -U1 | wc -l
> 102402

Thanks for the test.  However, it doesn't demonstrate the bug for me,
since ls succeeded in both cases.  Increasing to 100k, both still passed.
However at 1 million files (created via "seq 1000000|xargs touch"),
I did see the same behavior.

If you can find a way to exercise this portably without consuming
inordinate resources, it'd be great to add a test case.

> From 92a29217298044c1b9298d557ac6fd683effbd41 Mon Sep 17 00:00:00 2001
> From: Kamil Dudka <address@hidden>
> Date: Wed, 30 Jul 2008 12:40:49 +0200
> Subject: [PATCH] ls: now works with constant amount of memory when not 
> sorting and using
> one_per_line format, no matter how many files are in the directory
> * ls.c (write_out_current_files): New function for immediately write out
> of files in the table.
> ls.c (print_dir): Write out files immediately if possible.

Please mention each file name just once (after the "*").

> * NEWS: Mention the change.

I've moved the entry into the "improvements" section.

...
> @@ -2402,6 +2403,8 @@ print_dir (char const *name, char const *realname, bool 
> command_line_arg)
>  #endif
>             total_blocks += gobble_file (next->d_name, type, D_INO (next),
>                                          false, name);
> +              if (format == one_per_line && sort_type == sort_none)
> +                write_out_current_files ();

Indentation is not consistent.

>           }
>       }
>        else if (errno != 0)
> @@ -3239,6 +3242,19 @@ sort_files (void)
>                       [directories_first]);
>  }
>
> +/* Sort files in the table and write out immediately.  */
> +
> +static void
> +write_out_current_files (void)
> +{
> +  if (cwd_n_used)

Is this test necessary?

> +    {
> +      sort_files ();

It is not obvious why we're calling sort_files
from a context with sort_type == sort_none, so I added
a comment.

> +      print_current_files ();
> +      clear_files ();

Here's the adjusted patch.
I'll push it Friday.

>From b1097752bbb3e9459a7798bf34db7ad35baf94c9 Mon Sep 17 00:00:00 2001
From: Kamil Dudka <address@hidden>
Date: Wed, 30 Jul 2008 14:31:50 +0200
Subject: [PATCH] ls -U1 now uses constant memory

When printing one name per line and not sorting, ls now uses
constant memory per directory, no matter how many files are in
the directory.
* ls.c (print_dir): Print each file name immediately, when possible.
* NEWS: Mention the improvement.
---
 NEWS     |    3 +++
 src/ls.c |   14 ++++++++++++++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index 5796dfa..4637eba 100644
--- a/NEWS
+++ b/NEWS
@@ -51,6 +51,9 @@ GNU coreutils NEWS                                    -*- 
outline -*-

   join has significantly better performance due to better memory management

+  ls now uses constant memory when not sorting and using one_per_line format,
+  no matter how many files are in a given directory
+
   od now aligns fields across lines when printing multiple -t
   specifiers, and no longer prints fields that resulted entirely from
   padding the input out to the least common multiple width.
diff --git a/src/ls.c b/src/ls.c
index 4b69f7d..a661c06 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -2402,6 +2402,20 @@ print_dir (char const *name, char const *realname, bool 
command_line_arg)
 #endif
              total_blocks += gobble_file (next->d_name, type, D_INO (next),
                                           false, name);
+
+             /* In this narrow case, print out each name right away, so
+                ls uses constant memory while processing the entries of
+                this directory.  Useful when there are many (millions)
+                of entries in a directory.  */
+             if (format == one_per_line && sort_type == sort_none)
+               {
+                 /* We must call sort_files in spite of
+                    "sort_type == sort_none" for its initialization
+                    of the sorted_file vector.  */
+                 sort_files ();
+                 print_current_files ();
+                 clear_files ();
+               }
            }
        }
       else if (errno != 0)
--
1.6.0.rc1.2.gc4577




reply via email to

[Prev in Thread] Current Thread [Next in Thread]