[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] ls - fails in a directory with >6M files
From: |
Jim Meyering |
Subject: |
Re: [PATCH] ls - fails in a directory with >6M files |
Date: |
Wed, 30 Jul 2008 15:40:46 +0200 |
Kamil Dudka <address@hidden> wrote:
> as solution to rhbz #441807 (/bin/ls fails in a directory with >6M files)
Thanks for the pointer and patch.
Normally I prefer to credit the bug reporter and mention
any bug-tracking number, but this one is private.
Does it need to remain that way?
> I propose tiny patch for ls - work with constant amount of memory when not
> sorting and using one_per_line format, no matter how many files are in the
> directory.
>
> Tested with directory of about 10000 files.
>
> Git version:
> $ ulimit -v 102400
> $ ls -U1 | wc -l
> ls: memory exhausted
> 0
>
> Patched version:
> $ ulimit -v 102400
> $ ls -U1 | wc -l
> 102402
Thanks for the test. However, it doesn't demonstrate the bug for me,
since ls succeeded in both cases. Increasing to 100k, both still passed.
However at 1 million files (created via "seq 1000000|xargs touch"),
I did see the same behavior.
If you can find a way to exercise this portably without consuming
inordinate resources, it'd be great to add a test case.
> From 92a29217298044c1b9298d557ac6fd683effbd41 Mon Sep 17 00:00:00 2001
> From: Kamil Dudka <address@hidden>
> Date: Wed, 30 Jul 2008 12:40:49 +0200
> Subject: [PATCH] ls: now works with constant amount of memory when not
> sorting and using
> one_per_line format, no matter how many files are in the directory
> * ls.c (write_out_current_files): New function for immediately write out
> of files in the table.
> ls.c (print_dir): Write out files immediately if possible.
Please mention each file name just once (after the "*").
> * NEWS: Mention the change.
I've moved the entry into the "improvements" section.
...
> @@ -2402,6 +2403,8 @@ print_dir (char const *name, char const *realname, bool
> command_line_arg)
> #endif
> total_blocks += gobble_file (next->d_name, type, D_INO (next),
> false, name);
> + if (format == one_per_line && sort_type == sort_none)
> + write_out_current_files ();
Indentation is not consistent.
> }
> }
> else if (errno != 0)
> @@ -3239,6 +3242,19 @@ sort_files (void)
> [directories_first]);
> }
>
> +/* Sort files in the table and write out immediately. */
> +
> +static void
> +write_out_current_files (void)
> +{
> + if (cwd_n_used)
Is this test necessary?
> + {
> + sort_files ();
It is not obvious why we're calling sort_files
from a context with sort_type == sort_none, so I added
a comment.
> + print_current_files ();
> + clear_files ();
Here's the adjusted patch.
I'll push it Friday.
>From b1097752bbb3e9459a7798bf34db7ad35baf94c9 Mon Sep 17 00:00:00 2001
From: Kamil Dudka <address@hidden>
Date: Wed, 30 Jul 2008 14:31:50 +0200
Subject: [PATCH] ls -U1 now uses constant memory
When printing one name per line and not sorting, ls now uses
constant memory per directory, no matter how many files are in
the directory.
* ls.c (print_dir): Print each file name immediately, when possible.
* NEWS: Mention the improvement.
---
NEWS | 3 +++
src/ls.c | 14 ++++++++++++++
2 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/NEWS b/NEWS
index 5796dfa..4637eba 100644
--- a/NEWS
+++ b/NEWS
@@ -51,6 +51,9 @@ GNU coreutils NEWS -*-
outline -*-
join has significantly better performance due to better memory management
+ ls now uses constant memory when not sorting and using one_per_line format,
+ no matter how many files are in a given directory
+
od now aligns fields across lines when printing multiple -t
specifiers, and no longer prints fields that resulted entirely from
padding the input out to the least common multiple width.
diff --git a/src/ls.c b/src/ls.c
index 4b69f7d..a661c06 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -2402,6 +2402,20 @@ print_dir (char const *name, char const *realname, bool
command_line_arg)
#endif
total_blocks += gobble_file (next->d_name, type, D_INO (next),
false, name);
+
+ /* In this narrow case, print out each name right away, so
+ ls uses constant memory while processing the entries of
+ this directory. Useful when there are many (millions)
+ of entries in a directory. */
+ if (format == one_per_line && sort_type == sort_none)
+ {
+ /* We must call sort_files in spite of
+ "sort_type == sort_none" for its initialization
+ of the sorted_file vector. */
+ sort_files ();
+ print_current_files ();
+ clear_files ();
+ }
}
}
else if (errno != 0)
--
1.6.0.rc1.2.gc4577