[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Is there a memory leak in find ?
From: |
Bernhard Voelker |
Subject: |
Re: Is there a memory leak in find ? |
Date: |
Fri, 11 Apr 2014 08:30:18 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 |
On 04/11/2014 06:28 AM, Paul E Condon wrote:
> I've done more testing using several different trees as input data. In
> all cases the script terminates well before it has completely walked
> the tree, always when the output file has grown to about 2.8GB. The
> last line is always a complete well formed output line with no sign of
> syserr stuff. The project that I'm working on is my own variation of
> using rsync to do frequent snapshot backups along the lines discussed
> by Mike Rubel in
>
> http://www.mikerubel.org/computers/rsync_snapshots/
>
> In my (growing) backup tree(s) there are many files with hundreds of
> hard links. Find does nothing more the report the numerical value of
> the hard link count, so I don't think the hard links are the source of
> the problem. But they are a reason that the problem needs to be
> solved. But first ... you must be able to duplicate the problem.
I'm also using my own rsync-based snapshooting - with 24x hourly +
7x daily + 4x weekly + 12x monthly + 1x additional "LATEST" snapshots.
That makes e.g. 48 hardlinks e.g. for my ~/.profile, and every snapshot
has about ~87000 files for 67G of data:
$ ls -ldog home/daily.0/berny/.profile
-rw-r--r-- 48 1149 Feb 15 2013 home/daily.0/berny/.profile
$find home/daily.0 | wc -l
87919
$ du -shx home/daily.0
67G home/daily.0
Which version of 'find' are you using?
I don't see this mem-leaking effect - neither with latest 'find' from
the git repository nor with that on my openSUSE-13.1 system (4.5.12).
This is what /usr/bin/time reports:
User time (seconds): 18.55
System time (seconds): 24.71
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:43.58
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 11040
Average resident set size (kbytes): 0
Looking at the git history, and if your find verison is <= 4.5.10,
then I'd guess you hit the bug fixed with this commit
http://git.sv.gnu.org/cgit/findutils.git/commit/?id=e17d58833b
find: fix two time-formatting leaks (bug #37356)
* find/print.c (do_time_format): Call xmalloc for static "buf" only
the first time.
When reallocating buf, be sure to update its buf_size.
Also free "altbuf".
Reported by Nemo Maelstrom Thorx in http://bugs.debian.org/687358
via Andreas Metzler in http://savannah.gnu.org/bugs/?37356
If this is not the case, then please tell us more about how to
produce the OOM.
Meanwhile, and if this is problem really originates from the massive
hard-linking, then you could probably work around it by calling the
find(1) for every single snapshot, and prefixing the last column with
the snapshot name ("hourly.0/" etc.) yourself.
Have a nice day,
Berny