Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remov

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remov

From:	Jim Meyering
Subject:	Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c
Date:	Fri, 26 Sep 2008 10:16:20 +0200

Eric Blake <address@hidden> wrote:

> Jim Meyering <jim <at> meyering.net> writes:
>
>> i.e., this change makes rm process dir entry names in sorted
>> inode order, when that's sensible.
>
> What about 'ls' - should that also be changed to stat in inode order when -f 
> is
> not enabled?  (An interesting thought, since it means ls would then be sorting
> twice per directory in order to run faster than with a single sort!)

IMHO, it's not worth worrying about ls -R, given its current state.
First, it forms full relative file names (so it's quadratic in
that sense, too, for deep trees).  Second, with the fts fix,
find will work fine, so I'm not inclined to bother with ls at all.
It's not as critical as the others.  Perhaps someone will convert it
to use fts one of these days.

>>  ** New features
>>
>> +  chgrp, chmod, chown, chcon, du, rm: now all display linear performance,
>
> Is this really true, given that sorting is at best log-linear (and with qsort,
> possibly quadratic)?  True, log-linear is noticeably better than quadratic, 
> but

Yes.
The N^2 term applies to the number of seeks, so the constant is huge.

> have you proven that the linear time spent stat'ing the sorted list outweighs
> the log-linear cost of sorting enough to discard the log-linear term?  I guess
> that explains the FIXME comment about detecting tmpfs to avoid the sort.

No proof here, but I've done my share of benchmarks.
These changes are most definitely an improvement.
For example, to run "/bin/chmod -R 700 d" on directory d with 300000
just-created entries (each an empty file with name in 1..300k) on an
ext3 file system (Fedora 9 system with a fast disk and 4GB RAM) takes
about 160 seconds.

Running that same command with chmod built using the modified fts, it
handles the 300k dir in under 4 seconds, and can process a 1.5M-entry
directory in just 40 seconds.  Extrapolating (don't want to destroy any
disks), using pre-patch chmod on a directory like that would take over
an hour, and in the 6M-entry range it'd take a day.

The following numbers are not statistically valid (single trial),
but were measured on an otherwise idle system and the trends
are accurate and reproducible, overall.

Using pre-patch fts:
          seconds
N  create  chmod -R 700 dir
10000 1.16 0.49
50000 5.84 2.70
100000 11.95 5.20
200000 23.93 46.49
300000 36.76 165.88
400000 50.39 271.03
500000 63.98 471.05
EOF

Using patched fts:
          seconds
N  create  chmod -R 700 dir
10000 0.40 0.05
50000 1.94 0.26
100000 4.35 0.54
200000 8.19 2.29
300000 15.12 3.28
400000 20.16 6.09
500000 26.02 8.57
600000 32.29 10.24
700000 37.35 14.33
800000 40.09 15.79
900000 50.48 17.95
1000000 60.94 22.39
1100000 59.77 16.16
1200000 76.31 27.02
1300000 74.06 31.01
1400000 92.17 35.30
1500000 101.09 40.68

>> +  INODE_SORT_DIR_ENTRIES_THRESHOLD = 10000
>
> And you might want to play with lowering this threshold once you convert to a
> different sort algorithm, in case the time of qsort was impacting the timing
> impacts you used when selecting this threshold.

I doubt it, because the time to sort such a small set it
truly negligible when compared to the slightest bit of I/O.

>> +/* Return an approximation of the maximum number of dirent entries
>> +   in a directory with stat data *ST.  */
>> +static size_t
>> +dirent_count (struct stat const *st)
>> +{
>> +  return st->st_size / 16;
>
> Hmm. On cygwin, directories show st_size of 0 (it's too time-consuming to
> compute a real value).  Is this approximation going to be valid on every file
> system where sorting benefits?  Or should we add a special case for a st_size
> of zero to request sorting, on the worst-case assumption that the directory is
> big even though the OS won't tell us how big?

My goal was to have minimal impact on Linux-based systems
using the likes of ext3 and ext4, that do provide usable st_size
for directories.

If you'd like to make it play nicely also on cygwin, that'd
be great.  But I'd hesitate to do all that work (allocate space,
read all entries, and sort them) unconditionally, just to avoid
some unusual worst-case scenario.

>> +static void
>> +preprocess_dir (DIR **dirp, struct rm_options const *x)
>> +{
>> +#if HAVE_STRUCT_DIRENT_D_TYPE
>
> Phooey - cygwin currently lacks d_type.  So I guess the bulk of this patch
> won't benefit cygwin anyway, whether or not NTFS is also a file system where
> sorting would help (although I don't know if cygwin's emulation of inodes atop
> NTFS will still result in something where sorted order will help or hinder 
> disk
> seek times).
>
> And shouldn't this condition also check for the presence d_ino, since 
> otherwise
> the D_INO below makes every inode 0 and the qsort unstable?

I thought of that, but since all of the file systems I'm considering
(xfs, reiserfs, ext3, ext4dev, tmpfs, zfs, btrfs) *do* have
dirent.d_ino, I didn't spend any time on it.
But it's obvious that you understand: cygwin+NTFS might not even
benefit from inode-based sorting, in which case this is moot.

If this is a real, measurable problem on a system
lacking d_ino support, I'm sure someone who cares about
such file systems will investigate and report on it.

>> +      /* - the directory is smaller than some threshold.
>> +     Estimate the number of inodes with a heuristic.
>> +         There's no significant benefit to sorting if there are
>> +     too few inodes.  This  */
>
> Dangling comment.

Good catch.  Removed.

>> +  /* Read all entries, storing <d_ino, d_name> for each non-dir one.
>> +     Maintain a parallel list of pointers into the primary buffer.  */
>> +  while (1)
>> +    {
>> +      struct dirent const *dp;
>> +      dp = readdir_ignoring_dot_and_dotdot (*dirp);
>> +      /* no need to distinguish EOF from failure */
>> +      if (dp == NULL)
>> +    break;
>
> Would it be worth using scandir to do the argument collection and sort, rather
> than writing your own loop?  There's already the proposal of porting scandir 
> to
> gnulib.

I suspect that using scandir would be far less efficient that using an
obstack with million-entry directories.  Consider whether you'd rather
call both malloc and free a million times or an obstack-growing function.

>> +  /* Iterate through those pointers, unlinking each name.  */
>> +  size_t i;
>> +  for (i = 0; i < n; i++)
>
> Since you've already gone for C99, should you compress these two lines?
>
> for (size_t i = 0;...

I've been sorely tempted, but opted not to, since some compilers allow
dcl-after-stmt, but not that style of for-loop-scoped declaration.
But now, as I write this, I'm thinking I shouldn't pander to them,
either.  So I've done as you suggest.  Thanks.

>> +  /* Ensure each of these is NULL, in case init/allocation
>> +     fails and we end up calling ds_free on all three while only
>> +     one or two has been initialized.  */
>> +  for (i = 0; i < 3; i++)
>> +    o[i]->chunk = NULL;
>
> Does the obstack API allow you to safely reference this field without going
> through an accessor macro/function?

As far as I can see the API provides no way to initialize that member
other than through a call to the memory allocator.  But by then it's too
late.  Actually, my first version used memset to zero out each struct,
but it seemed wasteful to zero out all 88 bytes when all that mattered
was that one pointer.

Doing the right thing probably means adding an initializer function
that does not allocate, but merely renders the buffer safe to call
obstack_free on.

[Prev in Thread]

Current Thread

[Next in Thread]

rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Jim Meyering, 2008/09/25
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Eric Blake, 2008/09/25
  - Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Jim Meyering <=
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Ralf Wildenhues, 2008/09/26
  - Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Jim Meyering, 2008/09/26
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Andreas Schwab, 2008/09/26
  - Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Jim Meyering, 2008/09/26
- Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c, Jim Meyering, 2008/09/26

Prev by Date: Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c
Next by Date: Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c
Previous by thread: Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c
Next by thread: Re: rm -rf: avoid ext3/4 O(N^2) performance hit, further librarify remove.c
Index(es):
- Date
- Thread