bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Faster way to prune directory?


From: Bernhard Voelker
Subject: Re: Faster way to prune directory?
Date: Thu, 16 Apr 2015 08:26:04 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 04/16/2015 06:04 AM, Peng Yu wrote:
> Hi, The following code shows that -prune when used with -exec can be
> very slow. Is there somehow a way to speed this up?
> 
> ~$ cat main.sh
> #!/usr/bin/env bash
> 
> tmpdir=$(mktemp -d)
> 
> function mkalotdir {
> local n=$1
> local i
> local j
> local k
> for i in $(seq -w "$n")
> do
>   for j in $(seq -w "$n")
>   do
>     for k in $(seq -w "$n")
>     do
>       echo "$tmpdir/$i/$j/$k"
>     done
>   done
> done | xargs mkdir -p
> }
> 
> function myfind {
> find "$tmpdir" > /dev/null
> }
> 
> function myfindprune {
> find "$tmpdir" -exec $(type -P test) -e {}/.findignore ';' -prune -o
> -print > /dev/null
> }
> 
> mkalotdir 10
> echo myfind
> time myfind
> echo myfindprune
> time myfindprune
> 
> ~$ ./main.sh
> myfind
> 
> real 0m0.018s
> user 0m0.005s
> sys 0m0.011s
> myfindprune
> 
> real 0m5.354s
> user 0m1.145s
> sys 0m1.539s

Well, half a second for 1111 times creating and running /usr/bin/test
doesn't seem too much.  At least, I can second your timing results.

The time is not lost in find but with executing the test(1) program
for so many times.  To get an idea, start the above command line with
"strace -f -v find ...".

You'll see that you are "comparing apples with pears" - my home country
doesn't grow oranges, so that's what this saying looks like over here.
;-)

To get a little better result, you could avoid the overhead in test(1)
regarding NLS etc. by rolling your own, puristic(!) test program:

  $ cat /tmp/mystat.c
  #include <sys/stat.h>
  int main (int argc, char**argv) {
    struct stat sb;
    return -1 == stat (argv[1], &sb);
  }

  $ gcc -Wall -O3 -o /tmp/mystat /tmp/mystat.c
  $ strip /tmp/mystat

  $ time find . -type d -exec /tmp/mystat '{}'/.findignore \; -prune -o -print 
>/dev/null

  real    0m0.340s
  user    0m0.014s
  sys     0m0.064s

Given what the system has to process compared to a bare "find .",
this is IMO quite good, isn't it?

Have a nice day,
Berny



reply via email to

[Prev in Thread] Current Thread [Next in Thread]