bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6557: du sometimes miscounts directories, and files whose link count


From: Paul Eggert
Subject: bug#6557: du sometimes miscounts directories, and files whose link count equals 1
Date: Fri, 02 Jul 2010 23:41:08 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4

(I found this bug by code inspection while doing the du performance
improvement reported in:
http://lists.gnu.org/archive/html/bug-coreutils/2010-07/msg00014.html
)

Unless -l is given, du is not supposed to count the same file more
than once.  It optimizes this test by not bothering to put a file into
the hash table if its link count is 1, or if it is a directory.  But
this optimization is not correct if -L is given (because the same
link-count-1 file, or directory, can be seen via symbolic links) or if
two or more arguments are given (because the same such file can be
seen under multiple arguments).  The optimization should be suppressed
if -L is given, or if multiple arguments are given.

Here is a patch, with a couple of test cases for it.  This patch
assumes the du performance fix, but I can prepare an independent
patch if you like.

-----

Don't miscount directories or link-count-1 files seen multiple times.
* NEWS: Mention this.
* src/du.c (hash_all): New static var.
(process_file): Use it.
(main): Set it.
* tests/du/hard-link: Add a couple of test cases to help make
sure this bug stays squashed.
diff --git a/NEWS b/NEWS
index 2493ef8..82190d9 100644
--- a/NEWS
+++ b/NEWS
@@ -42,6 +42,11 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   Also errors are no longer suppressed for unsupported file types, and
   relative sizes are restricted to supported file types.
 
+** Bug fixes
+
+  du no longer multiply counts a file that is a directory or whose
+  link count is 1, even if the file is reached multiple times by
+  following symlinks or via multiple arguments.
 
 * Noteworthy changes in release 8.5 (2010-04-23) [stable]
 
diff --git a/src/du.c b/src/du.c
index bc24861..739be73 100644
--- a/src/du.c
+++ b/src/du.c
@@ -121,6 +121,9 @@ static bool apparent_size = false;
 /* If true, count each hard link of files with multiple links.  */
 static bool opt_count_all = false;
 
+/* If true, hash all files to look for hard links.  */
+static bool hash_all;
+
 /* If true, output the NUL byte instead of a newline at the end of each line. 
*/
 static bool opt_nul_terminate_output = false;
 
@@ -457,8 +460,7 @@ process_file (FTS *fts, FTSENT *ent)
      via a hard link, then don't let it contribute to the sums.  */
   if (skip
       || (!opt_count_all
-          && ! S_ISDIR (sb->st_mode)
-          && 1 < sb->st_nlink
+          && (hash_all || (! S_ISDIR (sb->st_mode) && 1 < sb->st_nlink))
           && ! hash_ins (sb->st_ino, sb->st_dev)))
     {
       /* Note that we must not simply return here.
@@ -876,11 +878,20 @@ main (int argc, char **argv)
                quote (files_from));
 
       ai = argv_iter_init_stream (stdin);
+
+      /* It's not easy here to count the arguments, so assume the
+         worst.  */
+      hash_all = true;
     }
   else
     {
       char **files = (optind < argc ? argv + optind : cwd_only);
       ai = argv_iter_init_argv (files);
+
+      /* Hash all dev,ino pairs if there are multiple arguments, or if
+         following non-command-line symlinks, because in either case a
+         file with just one hard link might be seen more than once.  */
+      hash_all = (optind + 1 < argc || symlink_deref_bits == FTS_LOGICAL);
     }
 
   if (!ai)
diff --git a/tests/du/hard-link b/tests/du/hard-link
index 7e4f51a..e22320b 100755
--- a/tests/du/hard-link
+++ b/tests/du/hard-link
@@ -26,24 +26,40 @@ fi
 . $srcdir/test-lib.sh
 
 mkdir -p dir/sub
-( cd dir && { echo non-empty > f1; ln f1 f2; echo non-empty > sub/F; } )
-
-
-# Note that for this first test, we transform f1 or f2
-# (whichever name we find first) to f_.  That is necessary because,
-# depending on the type of file system, du could encounter either of those
-# two hard-linked files first, thus listing that one and not the other.
-du -a --exclude=sub dir \
-  | sed 's/^[0-9][0-9]*        //' | sed 's/f[12]/f_/' > out || fail=1
-echo === >> out
-du -a --exclude=sub --count-links dir \
-  | sed 's/^[0-9][0-9]*        //' | sort -r >> out || fail=1
+( cd dir &&
+  { echo non-empty > f1
+    ln f1 f2
+    ln -s f1 f3
+    echo non-empty > sub/F; } )
+
+du -a -L --exclude=sub --count-links dir \
+  | sed 's/^[0-9][0-9]*        //' | sort -r > out || fail=1
+
+# For these tests, transform f1 or f2 or f3 (whichever name is find
+# first) to f_.  That is necessary because, depending on the type of
+# file system, du could encounter any of those linked files first,
+# thus listing that one and not the others.
+for args in '-L' 'dir' '-L dir'
+do
+  echo === >> out
+  du -a --exclude=sub $args dir \
+    | sed 's/^[0-9][0-9]*      //' | sed 's/f[123]/f_/' >> out || fail=1
+done
+
 cat <<\EOF > exp
+dir/f3
+dir/f2
+dir/f1
+dir
+===
 dir/f_
 dir
 ===
-dir/f2
-dir/f1
+dir/f_
+dir/f_
+dir
+===
+dir/f_
 dir
 EOF
 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]