bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6557: du sometimes miscounts directories, and files whose link count


From: Jim Meyering
Subject: bug#6557: du sometimes miscounts directories, and files whose link count equals 1
Date: Sat, 03 Jul 2010 10:18:18 +0200

Paul Eggert wrote:
> (I found this bug by code inspection while doing the du performance
> improvement reported in:
> http://lists.gnu.org/archive/html/bug-coreutils/2010-07/msg00014.html
> )
>
> Unless -l is given, du is not supposed to count the same file more
> than once.  It optimizes this test by not bothering to put a file into
> the hash table if its link count is 1, or if it is a directory.  But
> this optimization is not correct if -L is given (because the same
> link-count-1 file, or directory, can be seen via symbolic links) or if
> two or more arguments are given (because the same such file can be
> seen under multiple arguments).  The optimization should be suppressed
> if -L is given, or if multiple arguments are given.
>
> Here is a patch, with a couple of test cases for it.  This patch
> assumes the du performance fix, but I can prepare an independent
> patch if you like.

Thanks!
Actually, that patch applies just fine, as-is.
However, it induces this new "make check" test failure:

    FAIL: du/files0-from (exit: 1)
    ==============================

    du (GNU coreutils) 8.5.75-569b2
    Copyright (C) 2010 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>.
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.

    Written by Torbjorn Granlund, David MacKenzie, Paul Eggert,
    and Jim Meyering.
    f-extra-arg...
    missing...
    minus-in-stdin...
    empty...
    empty-nonreg...
    nul-1...
    nul-2...
    1...
    1a...
    2...
    files0-from: test 2: stdout mismatch, comparing 2.O (actual) and 2.1 
(expected)
    *** 2.O Sat Jul  3 09:28:08 2010
    --- 2.1 Sat Jul  3 09:28:08 2010
    ***************
    *** 1 ****
    --- 1,2 ----
      0     g
    + 0     g
    2a...
    files0-from: test 2a: stdout mismatch, comparing 2a.O (actual) and 2a.1 
(expected)
    *** 2a.O        Sat Jul  3 09:28:08 2010
    --- 2a.1        Sat Jul  3 09:28:08 2010
    ***************
    *** 1 ****
    --- 1,2 ----
      0     g
    + 0     g
    zero-len...



That's because with the unpatched "du", a command like this, with
a duplicate argument, prints two lines, while the patched version
prints two:

    $ seq 100 > g; du g g
    4       g
    4       g

    $ seq 100 > g; ./du g g
    4       g

Note that the vendor versions of "du" from at least Solaris 10,
openBSD, netBSD and freeBSD print both lines.
I prefer the new semantics, especially when using --total:

    $ seq 100 > g; du --total g g
    4       g
    4       g
    8       total

    $ seq 100 > g; ./du --total g g
    4       g
    4       total

You can get some of the old semantics by using -l:

    $ seq 100 > g; ./du -l --total g g
    4       g
    4       g
    8       total

What do you think of breaking with that tradition?  POSIX does appear
to say that for each "FILE" argument du must print a line, but it also
mentions how with linked files, the space must be counted only once.
You can definitely consider listing the same file twice as being
analogous to a file being hard-linked.

An alternative might be to do this,

    $ seq 100 > g; du --total g g
    4       g
    0       g
    4       total
but this is too prone to misinterpretation both by people and by code
that parses du output.  So I'm inclined to go with your approach.

-------------------------------------
This is the additional patch we'd need to make the failing
failing test accept your new output.  You're welcome to merge
it into yours.

diff --git a/tests/du/files0-from b/tests/du/files0-from
index 620246d..860fc6a 100755
--- a/tests/du/files0-from
+++ b/tests/du/files0-from
@@ -70,15 +70,15 @@ my @Tests =
     {IN=>{f=>"g\0"}}, {AUX=>{g=>''}},
     {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],

-   # two file names, no final NUL
+   # two identical file names, no final NUL
    ['2', '--files0-from=-', '<',
     {IN=>{f=>"g\0g"}}, {AUX=>{g=>''}},
-    {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
+    {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],

-   # two file names, with final NUL
+   # two identical file names, with final NUL
    ['2a', '--files0-from=-', '<',
     {IN=>{f=>"g\0g\0"}}, {AUX=>{g=>''}},
-    {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],
+    {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ],

    # Ensure that $prog processes FILEs following a zero-length name.
    ['zero-len', '--files0-from=-', '<',





reply via email to

[Prev in Thread] Current Thread [Next in Thread]