bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29921: O(n^2) performance of rm -r


From: Niklas Hambüchen
Subject: bug#29921: O(n^2) performance of rm -r
Date: Mon, 1 Jan 2018 18:13:21 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0

On 01/01/2018 17.40, Pádraig Brady wrote:
> The user and sys components of the above are largely linear,
> so the increasing delay is waiting on the device.

My understanding of commit 24412edeaf556a was that "waiting for the
device" is what's claimed to be nonlinear with the patch -- it
explicitly mentions that it fixes behaviour that's only present on
seeking devices, so that would be real time, not user/sys, right?

Also, the test added at the end of the patch measures real time, not
user/sys.

So I'm relatively sure that what Jim Meyering was measuring/fixing when
he wrote the patch and commit message was real time.
But it would be nice if he could confirm that.

> To confirm I did a quick check on a ramdisk (/dev/shm)
> to minimize the latency effects, and that showed largely
> linear behaviour in the real column also.

In the patch I linked, it says specifically

  "RAM-backed file systems (tmpfs) are not affected,
  since there is no seek penalty"

So I am not sure what checking on a ramdisk gains us here, unless you
think that this statement in the patch was incorrect.

> Note if you want to blow away a whole tree quite often,
> it may be more efficient to recreate the file system
> on a separate (loopback) device.

While a nice idea, this is not very practical advice, as when an
application goes crazy and writes millions of files onto the disk, I
can't just wipe the entire file system.

A O(n^2) wall-time `rm -r` is a big problem.

Of course it may not be coreutils's fault, but since coreutils was
clearly aware of the issue in the past and claimed it was fixed, I think
we should investigate to be really sure this isn't a regression anywhere
in the ecosystem.

> I presume that's coming from increased latency effects
> as the cached operations are sync'd more to the device
> with increasing numbers of operations.
> There are many Linux kernel knobs for adjusting
> caching behaviour in the io scheduler.

I don't follow this reasoning.
Should not any form of caching eventually follow linear behaviour?

Also, if it were as you said, shouldn't the `touch` also eventually
become quadratic?
It seems not so, creating the dir entries seems perfectly linear.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]