coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: My experience with using cp to copy a lot of files (432 millions, 3


From: Bernhard Voelker
Subject: Re: My experience with using cp to copy a lot of files (432 millions, 39 TB)
Date: Thu, 21 Aug 2014 09:10:23 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0

On 08/11/2014 03:55 PM, Rasmus Borup Hansen wrote:
Trusting that resizing the hash table would eventually finish, the cp
command was allowed to continue, and after a while it started copying
> again. It stopped again and resized the hash table a couple of times,
 each taking more and more time. Finally, after 10 days of copying and
 hash table resizing, the new file system used as many blocks and inodes
 as the old one according to df, but to my surprise the cp command didn't
 exit. Looking at the source again, I found that cp disassembles its hash
 table data structures nicely after copying (the forget_all call). Since
 the virtual size of the cp process was now more than 17 GB and the
 server only had 10 GB of RAM, it did a lot of swapping.

Thinking about this case again, I find this very surprising:

a) that cp(1) uses 17 GB of memory when copying 39 TB of data.
That means roughly 2300 bytes per file:

  $ bc <<<'39 * 1024 / 17'
  2349

... although the hashed structure only has these members:

  struct Src_to_dest
  {
    ino_t st_ino;
    dev_t st_dev;
    char *name;
  };

I think either the file names where rather long (in average!),
or there is something wrong in the code.

b) that cp(1) is increasing the hash table that often.
This is because it uses the default Hash_tuning (hash.c):

  /* [...] The growth threshold defaults to 0.8, and the growth factor
     defaults to 1.414, meaning that the table will have doubled its size
     every second time 80% of the buckets get used.  */
  #define DEFAULT_GROWTH_THRESHOLD 0.8f
  #define DEFAULT_GROWTH_FACTOR 1.414f

It is like this since the introduction of hashing, and
I wonder if cp(1) couldn't use better values for this.

Have a nice day,
Berny



reply via email to

[Prev in Thread] Current Thread [Next in Thread]