[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#6557: du sometimes miscounts directories, and files whose link count
From: |
Jim Meyering |
Subject: |
bug#6557: du sometimes miscounts directories, and files whose link count equals 1 |
Date: |
Sat, 03 Jul 2010 10:36:00 +0200 |
Jim Meyering wrote:
> Paul Eggert wrote:
>> (I found this bug by code inspection while doing the du performance
>> improvement reported in:
>> http://lists.gnu.org/archive/html/bug-coreutils/2010-07/msg00014.html
>> )
>>
>> Unless -l is given, du is not supposed to count the same file more
>> than once. It optimizes this test by not bothering to put a file into
>> the hash table if its link count is 1, or if it is a directory. But
>> this optimization is not correct if -L is given (because the same
>> link-count-1 file, or directory, can be seen via symbolic links) or if
>> two or more arguments are given (because the same such file can be
>> seen under multiple arguments). The optimization should be suppressed
>> if -L is given, or if multiple arguments are given.
>>
>> Here is a patch, with a couple of test cases for it. This patch
>> assumes the du performance fix, but I can prepare an independent
>> patch if you like.
>
> Thanks!
> Actually, that patch applies just fine, as-is.
> However, it induces this new "make check" test failure:
...
> This is the additional patch we'd need to make the failing
> failing test accept your new output. You're welcome to merge
> it into yours.
Actually I did that.
Here's the adjusted patch, for review.
Note the "du: " prefix on the one-line log summary -- that's
the part that goes into the Subject below. Plus, I shortened it.
Also, I added a log line for the tests/du/files0-from change.
(BTW, the following is the output from "git format-patch --stdout -1".
It's easy to apply that by saving it in a FILE, then running "git am FILE")
>From efe53cc72b599979ea292754ecfe8abf7c839d22 Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Fri, 2 Jul 2010 23:41:08 -0700
Subject: [PATCH] du: don't miscount duplicate directories or link-count-1 files
* NEWS: Mention this.
* src/du.c (hash_all): New static var.
(process_file): Use it.
(main): Set it.
* tests/du/hard-link: Add a couple of test cases to help make
sure this bug stays squashed.
* tests/du/files0-from: Adjust existing tests to reflect
change in semantics with duplicate arguments.
---
NEWS | 5 +++++
src/du.c | 15 +++++++++++++--
tests/du/files0-from | 8 ++++----
tests/du/hard-link | 44 ++++++++++++++++++++++++++++++--------------
4 files changed, 52 insertions(+), 20 deletions(-)
diff --git a/NEWS b/NEWS
index 3a24925..b02a223 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,11 @@ GNU coreutils NEWS -*-
outline -*-
Also errors are no longer suppressed for unsupported file types, and
relative sizes are restricted to supported file types.
+** Bug fixes
+
+ du no longer multiply counts a file that is a directory or whose
+ link count is 1, even if the file is reached multiple times by
+ following symlinks or via multiple arguments.
* Noteworthy changes in release 8.5 (2010-04-23) [stable]
diff --git a/src/du.c b/src/du.c
index a90568e..4d6e03a 100644
--- a/src/du.c
+++ b/src/du.c
@@ -132,6 +132,9 @@ static bool apparent_size = false;
/* If true, count each hard link of files with multiple links. */
static bool opt_count_all = false;
+/* If true, hash all files to look for hard links. */
+static bool hash_all;
+
/* If true, output the NUL byte instead of a newline at the end of each line.
*/
static bool opt_nul_terminate_output = false;
@@ -518,8 +521,7 @@ process_file (FTS *fts, FTSENT *ent)
via a hard link, then don't let it contribute to the sums. */
if (skip
|| (!opt_count_all
- && ! S_ISDIR (sb->st_mode)
- && 1 < sb->st_nlink
+ && (hash_all || (! S_ISDIR (sb->st_mode) && 1 < sb->st_nlink))
&& ! hash_ins (sb->st_ino, sb->st_dev)))
{
/* Note that we must not simply return here.
@@ -937,11 +939,20 @@ main (int argc, char **argv)
quote (files_from));
ai = argv_iter_init_stream (stdin);
+
+ /* It's not easy here to count the arguments, so assume the
+ worst. */
+ hash_all = true;
}
else
{
char **files = (optind < argc ? argv + optind : cwd_only);
ai = argv_iter_init_argv (files);
+
+ /* Hash all dev,ino pairs if there are multiple arguments, or if
+ following non-command-line symlinks, because in either case a
+ file with just one hard link might be seen more than once. */
+ hash_all = (optind + 1 < argc || symlink_deref_bits == FTS_LOGICAL);
}
if (!ai)
diff --git a/tests/du/files0-from b/tests/du/files0-from
index 620246d..860fc6a 100755
--- a/tests/du/files0-from
+++ b/tests/du/files0-from
@@ -70,15 +70,15 @@ my @Tests =
{IN=>{f=>"g\0"}}, {AUX=>{g=>''}},
{OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
- # two file names, no final NUL
+ # two identical file names, no final NUL
['2', '--files0-from=-', '<',
{IN=>{f=>"g\0g"}}, {AUX=>{g=>''}},
- {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
+ {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
- # two file names, with final NUL
+ # two identical file names, with final NUL
['2a', '--files0-from=-', '<',
{IN=>{f=>"g\0g\0"}}, {AUX=>{g=>''}},
- {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
+ {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
# Ensure that $prog processes FILEs following a zero-length name.
['zero-len', '--files0-from=-', '<',
diff --git a/tests/du/hard-link b/tests/du/hard-link
index 7e4f51a..e22320b 100755
--- a/tests/du/hard-link
+++ b/tests/du/hard-link
@@ -26,24 +26,40 @@ fi
. $srcdir/test-lib.sh
mkdir -p dir/sub
-( cd dir && { echo non-empty > f1; ln f1 f2; echo non-empty > sub/F; } )
-
-
-# Note that for this first test, we transform f1 or f2
-# (whichever name we find first) to f_. That is necessary because,
-# depending on the type of file system, du could encounter either of those
-# two hard-linked files first, thus listing that one and not the other.
-du -a --exclude=sub dir \
- | sed 's/^[0-9][0-9]* //' | sed 's/f[12]/f_/' > out || fail=1
-echo === >> out
-du -a --exclude=sub --count-links dir \
- | sed 's/^[0-9][0-9]* //' | sort -r >> out || fail=1
+( cd dir &&
+ { echo non-empty > f1
+ ln f1 f2
+ ln -s f1 f3
+ echo non-empty > sub/F; } )
+
+du -a -L --exclude=sub --count-links dir \
+ | sed 's/^[0-9][0-9]* //' | sort -r > out || fail=1
+
+# For these tests, transform f1 or f2 or f3 (whichever name is find
+# first) to f_. That is necessary because, depending on the type of
+# file system, du could encounter any of those linked files first,
+# thus listing that one and not the others.
+for args in '-L' 'dir' '-L dir'
+do
+ echo === >> out
+ du -a --exclude=sub $args dir \
+ | sed 's/^[0-9][0-9]* //' | sed 's/f[123]/f_/' >> out || fail=1
+done
+
cat <<\EOF > exp
+dir/f3
+dir/f2
+dir/f1
+dir
+===
dir/f_
dir
===
-dir/f2
-dir/f1
+dir/f_
+dir/f_
+dir
+===
+dir/f_
dir
EOF
--
1.7.2.rc1.192.g262ff