coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My experience with using cp to copy a lot of files (432 millions, 39


From: Rasmus Borup Hansen
Subject: Re: My experience with using cp to copy a lot of files (432 millions, 39 TB)
Date: Thu, 21 Aug 2014 14:13:33 +0200

On 21 Aug 2014, at 11:31, Pádraig Brady wrote:

> The amount of files rather than the amount of data is pertinent here.
> So 17G/432M is about 40 bytes per entry which is about right.
> 
> cheers,
> Pádraig.

I don't have the exact file system available anymore, but I do have the output 
of "ls -laR", so I made a small Perl script that counted the number of plain 
files and symlinks (lines starting with "-" or "l") and computed the number of 
inodes by considering the link count of each file. It also computed the average 
length of the full paths for plain files and symlinks (157 bytes). It appears 
that I had 27,067,739 inodes corresponding to 365,721,810 directory entries for 
plain files/symlinks. The 432M I mentioned in my original post also included 
directories (67,087,195) as it was the number of lines in the output from "cp 
-v". A memory usage of 17 GB corresponds to more than 600 bytes per inode if 
you're only counting inodes for plain files or symbolic links. I haven't looked 
at the code since my first post, but if inodes for directories are also stored 
in the hash table we end up with around 180 bytes per inode which sounds 
reasonable. In don't know if hard links to directories are supported by cp, but 
if not, then not storing the directories' inodes in the hash table could save a 
lot of memory in my case – provided they're not needed for something else that 
I don't know about.

Also, thanks for the feedback.

Best,

Rasmus

Intomics is a contract research organization specialized in deriving core 
biological insight from large scale data. We help our clients in the 
pharmaceutical industry develop tomorrow's medicines better, faster, and 
cheaper through optimized use of biomedical data.
-----------------------------------------------------------------
Hansen, Rasmus Borup              Intomics - from data to biology
System Administrator              Diplomvej 377
Scientific Programmer             DK-2800 Kgs. Lyngby
                                  Denmark
E: address@hidden               W: http://www.intomics.com/
P: +45 5167 7972                  P: +45 8880 7979



reply via email to

[Prev in Thread] Current Thread [Next in Thread]