bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60416: Be more liberal using copy_file_range() with NFS mounted file


From: Pádraig Brady
Subject: bug#60416: Be more liberal using copy_file_range() with NFS mounted filesystems
Date: Fri, 30 Dec 2022 15:33:12 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.0

On 29/12/2022 16:04, Braiam wrote:
When using a nfs export, cp seems to not try hard enough using
copy_file_range(). This was
the conclusion we arrived in this forum thread[1] at Truenas forums.
It was found a way to force
cp to use it, but it should not be necessary, since there are supposed
to be fallbacks.

I'm unsure if we found something that triggered the undesired behavior.

[1]: 
https://www.truenas.com/community/threads/nfs-does-not-support-server-side-copy-with-zfs.103309/post-712071


Currently we don't use copy_file_range() with sparse files,
as per the Linux man page "copy_file_range() may expand any
holes existing in the requested range".
I confirmed that copy_file_range() definitely expands holes
on ext4 on Linux 5.7 at least.
Also the FreeBSD man page says "this system call attempts
to maintain holes in the output file for the byte range being copied.
However, this does not always work well."

Now maybe we should give precedence to server side copy for remote file systems,
though that would be optimizing runtime and network usage
while potentially pessimizing storage usage
if holes were expanded in the destination.

Now fundamentally copy_file_range() isn't restricted from
maintaining holes from source to destination,
which suggests we could give precedence to copy_file_range()
on remote file systems.

Also perhaps we can improve the heuristic somehow,
even again just for remote file systems.
Maybe determine a file is sparse on remote systems
only if it seems that more than half of the file is sparse.

For completeness, one might be wondering why we're using
copy_file_range() at all with --sparse=never, as that syscall
doesn't guarantee sparseness.  However in this case we
only use copy_file_range() with the portion of the file
considered non sparse (and manually write the zeros)¹.

So in summary pseudo code could be:

sparse_file = blocks < size/blocksize;
very_sparse_file = blocks < (size/2)/blocksize;

if (   (!possible_hole_in_range || sparse_mode=auto)
    && reflinks_allowed
    && (
         (remote_file && !very_sparse_file)
      || (!remote_file && !sparse_file)
       )
   )
     copy_file_range(...);


Note `stat <your files>; df -T <your files>` would
give us some concrete heuristics for your case at least.
Note it would be useful to get such stats from files
that were completely copied by copy_file_range() on your systems
(i.e. not by cp), to see if holes were expanded for your case.

cheers,
Pádraig

¹ Actually I now notice that where SEEK_HOLE is _not_ available
and copy_file_range() is, then `cp --sparse=never` would not be honored,
as copy_file_range() would expand the holes (confirmed on ext4 at least).
Now I don't know of any practical systems having copy_file_range()
and not having SEEK_HOLE, but I might add the constraint to the code,
at least for clarification reasons.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]