coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Support for --size in du


From: Likai Liu
Subject: Re: [PATCH] Support for --size in du
Date: Thu, 17 Jan 2013 09:46:02 -0500

Regarding the block size versus apparent size, like Padraig, I think it's okay to let --size just filter the size, whichever the user happens to choose right now. One can combine it with apparent size.

I think having special character in --size to denote max and min sizes are confusing. Why not have separate --max-size and --min-size arguments? This way you can filter by a range, and it's obvious what the flags mean. It's also more consistent in style with the --max-depth flag.

If you absolutely want just one arg, what about --size=[minsize]-[maxsize]? e.g. --size=4K- filters output to entries greater than 4K, --size=-8K filters output to those lesser than 8K, and --size=4K-8K filters output for those between 4K and 8K.

liulk


On Thu, Jan 17, 2013 at 6:51 AM, Pádraig Brady <address@hidden> wrote:
On 01/17/2013 07:19 AM, Bernhard Voelker wrote:
> On 01/17/2013 02:46 AM, Pádraig Brady wrote:
>> On 01/17/2013 01:23 AM, Bernhard Voelker wrote:
>>> I was pretty sure that this slipped also from Padraig's list.
>>
>> Sorry for the delay in this.
>>
>> Note it's still on the list:
>> http://www.pixelbeat.org/patches/coreutils/inbox_dec_2012.html
>>
>> You can browse older news and subscribe to new updates at:
>> http://www.pixelbeat.org/patches/coreutils/
>
> Thanks for the links.
>
>>> Therefore, I took Jakob's patch and amended it with documentation
>>> and a comprehensive test. ;-)
>>
>> Wow great work on the test.
>
> Well, that test just grew and grew. It's actually a result of
> me not being 100% happy with the --size option as in some
> situations it might confuse people more than it may help:
>
> E.g. users usually tend to "think in apparent sizes" for their
> files instead of block sizes.
>
> Having a directory like this:
>
>    $ find tmp -exec ls -dog '{}' +
>    drwxr-xr-x 5      4096 Jan 17 07:28 tmp
>    drwxr-xr-x 2      4096 Jan 17 07:29 tmp/big_dir
>    -rw-r--r-- 1 104857600 Jan 17 07:29 tmp/big_dir/big_file
>    drwxr-xr-x 2      4096 Jan 17 07:25 tmp/empty_dir
>    drwxr-xr-x 2      4096 Jan 17 07:28 tmp/small_dir
>    -rw-r--r-- 1         6 Jan 17 07:26 tmp/small_dir/small_file
>    -rw-r--r-- 1         0 Jan 17 07:22 tmp/x0
>    -rw-r--r-- 1         1 Jan 17 07:22 tmp/x1
>    -rw-r--r-- 1        10 Jan 17 07:22 tmp/x2
>    -rw-r--r-- 1       100 Jan 17 07:22 tmp/x3
>    -rw-r--r-- 1      1000 Jan 17 07:22 tmp/x4
>    -rw-r--r-- 1     10000 Jan 17 07:22 tmp/x5
>    -rw-r--r-- 1    100000 Jan 17 07:22 tmp/x6
>    -rw-r--r-- 1   1000000 Jan 17 07:22 tmp/x7
>
> Then filter files and directories greater/equal 4000:
>
>    $ src/du -B1 -a --size=4000 tmp | sort -k2
>    106012672  tmp
>    104861696  tmp/big_dir
>    104857600  tmp/big_dir/big_file
>    4096       tmp/empty_dir
>    8192       tmp/small_dir
>    4096       tmp/small_dir/small_file
>    4096       tmp/x1
>    4096       tmp/x2
>    4096       tmp/x3
>    4096       tmp/x4
>    12288      tmp/x5
>    102400     tmp/x6
>    1003520    tmp/x7
>
> This included also the small files tmp/x1 while it left out
> the empty file tmp/x0 ... but yet included the empty directory
> tmp/empty_dir. This feels somehow counter-intuitive.
>
> Now let's use the "apparent size":
>    $ src/du -B1 -a --size=4000 --app tmp | sort -k2
>    105985101 tmp
>    104861696 tmp/big_dir
>    104857600 tmp/big_dir/big_file
>    4096      tmp/empty_dir
>    4102      tmp/small_dir
>    10000     tmp/x5
>    100000    tmp/x6
>    1000000   tmp/x7
>
> This is much better. Well, the empty directory still shows up
> here (which might be different on a different file system),
> but at least the small files have gone.
>
> Thus said, it seems that automatically applying --apparent
> when -a and --size is specified would give a more "natural"
> result.
>
> In practice, the users will probably only search for huge files
> and directories, i.e. much greater than the file system's
> block size, but even then they'd be trapped by forgetting the
> --app option when it comes to sparse files:
>
>    $ src/truncate --size=1T tmp/sparse-1T
>
>    $ src/du -h -a --size=100M tmp
>    100M    tmp/big_dir/big_file
>    101M    tmp/big_dir
>    102M    tmp
>
>    $ src/du -h -a --size=100M --app tmp
>    100M    tmp/big_dir/big_file
>    101M    tmp/big_dir
>    1.0T    tmp/sparse-1T
>    1.1T    tmp
>
> The only way out of this - probably only my - confusion would
> be to prevent the use of the -a and the --size option together.
> But this would artificially restrict the user's flexibility.
>
> Does anyone else have such a feeling, too?

I think it's fine to have --size filtering what du outputs.
I.E. have it just honor -a. Your info on the subject is clear enough:


 +Please note that the @option{--size} option can be combined with the above
 +@option{--apparent-size} option, and in this case would elide entries based on
 +its apparent size.  This makes most sense for files, i.e. when the @option{-a}
 +is specified, too.

I'd remove the last sentence above actually.
The user may want to operate on the cumulative apparent size for dirs.


>> I wonder would it make sense to have consistent --size
>> handling for du and truncate. I.E. have --size='<10M'
>> specify the max size and --size='>10M' specify the min size?
>
> I personally do not like shell-special characters in optargs
> too much, as many users will forget to put it into quotes;
> --size=<10M may not be a great problem, but --size=>10M
> may destroy data.

Yes I agree. Maybe we should enforce the '+',
but then again maybe not since it means '>' in `find`,
rather than '>='. For comparison as it stands:

find -size +1233  ≍ du -B512 -a --size '1234'
find -size +1233c ≍ du -a --size '1234'


> I was rather thinking that to make it more consistent with
> "find tmp -size +10M", or even to teach find a new -csize
> (cumulative size) option ... as finding big directories was
> the original problem. On the other side, 'find' doesn't offer
> the flexibility to filter based on the block size, i.e. it
> would always include huge sparse files although these do
> not fill up the file system.
>
> Maybe the current implementation is still the better way ...

+1

thanks,
Pádraig.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]