coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sparse file processing improvements


From: Pádraig Brady
Subject: Re: sparse file processing improvements
Date: Tue, 07 Oct 2014 19:56:27 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 10/07/2014 02:50 AM, Bernhard Voelker wrote:
> On 10/06/2014 04:40 PM, Pádraig Brady wrote:
>> I've attached 3 patches to improve various sparse file handling.
>> 1. supports detecting holes < internal buffer size (currently 128KiB)
>> 2. employs the punch hole scheme demonstrated above
>> 3. reads sparse files efficiently even if not writing to a regular file
>>
>> thanks,
>> Pádraig.
> 
> Hi Padraig,
> 
> cool work, thanks!
> 
> At a first glance, the changes in copy.c seem to be correct.
> Without having a deeper look yet, the new inner loop to detect
> holes in the just-read buffer seems to do the lseek() twice if
> a hole would span adjacent, outer-loop read()s.
> The ls(1) output of the result in the new test seems to second
> my guess:
> 
>    + ls -lsh sparse.in sparse.out sparse.out2
>   512K -rw-r--r-- 1 voelkerb users 512K Oct  6 20:03 sparse.in
>   260K -rw-r--r-- 1 voelkerb users 512K Oct  6 20:03 sparse.out
>   256K -rw-r--r-- 1 voelkerb users 512K Oct  6 20:03 sparse.out2
> 
> This means the copying could be even more effective in the first
> run, right?

Yes the patch would do at least 1 lseek per outer read loop.
The extra 4K above I think is just a side effect on ext4 of the trailing lseek,
rather than just leaving it to truncate, or maybe just alignment slop.
But you're right, both could be improved by hoisting the hole control
vars above the read loop and only doing 1 lseek for each hole,
and only a truncate for the last hole. The fiemap based copy
also avoids the last lseek, so this would be more consistent as well.

The incremental diff is simple enough:

diff --git a/src/copy.c b/src/copy.c
index 12af6db..d32a131 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -165,6 +165,8 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
 {
   *last_write_made_hole = false;
   *total_n_read = 0;
+  bool make_hole = false;
+  off_t psize = 0;

   while (max_n_read)
     {
@@ -182,10 +184,8 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
       *total_n_read += n_read;

       /* Loop over the input buffer in chunks of hole_size.  */
-      bool make_hole = false;
       size_t csize = make_holes ? hole_size : buf_size;
       char *cbuf = buf;
-      size_t psize = 0;
       char *pbuf = buf;

       while (n_read)
@@ -207,7 +207,7 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
             }

           bool transition = (make_hole != prev_hole) && psize;
-          bool last_chunk = (n_read == csize) || ! csize;
+          bool last_chunk = (n_read == csize && ! make_hole) || ! csize;

           if (transition || last_chunk)
             {
@@ -231,16 +231,29 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t 
buf_size,
                     }
                 }

-              pbuf += psize;
+              pbuf = cbuf;
               psize = csize;

-              if (transition && last_chunk)
-                csize = 0;
+              if (last_chunk)
+                {
+                  if (transition)
+                    csize = 0; /* Loop again to deal with last chunk.  */
+                  else
+                    psize = 0; /* Reset for next read loop.  */
+                }
               else if (! csize)
-                n_read = 0;
+                n_read = 0;    /* Finished processing buffer.  */
             }
           else  /* Coalesce writes/seeks.  */
-            psize += csize;
+            {
+              if (psize <= OFF_T_MAX - csize)
+                psize += csize;
+              else
+                {
+                  error (0, 0, _("overflow reading %s"), quote (src_name));
+                  return false;
+                }
+            }

           n_read -= csize;
           cbuf += csize;

>> diff --git a/NEWS b/NEWS
>> index 1811ae4..785773f 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -30,6 +30,9 @@ GNU coreutils NEWS                                    -*- 
>> outline -*-
>>
>>  ** Improvements
>>
>> +  cp will convert smaller runs of NULs in the input to holes,
>> +  to reduce allocation in the copy.
>> +
> 
> Aren't ginstall(1) and mv(1) also somehow affected?  If so, then
> the NEWS should mention this.

done

>> diff --git a/tests/cp/sparse.sh b/tests/cp/sparse.sh
>> index d6cc4c4..1414d35 100755
>> --- a/tests/cp/sparse.sh
>> +++ b/tests/cp/sparse.sh
>> @@ -37,4 +37,32 @@ test $(stat --printf %b copy) -le $(stat --printf %b 
>> sparse) || fail=1
>>  cp --sparse=always --reflink sparse copy && fail=1
>>  cp --sparse=never --reflink sparse copy && fail=1
>>
>> +
>> +# Ensure we handle sparse/non-sparse transitions correctly
>> +maxn=128 # how many $hole_size chunks per file
>> +hole_size=$(stat -c %o copy)
>> +dd if=/dev/zero bs=$hole_size count=$maxn of=zeros
>> +tr '\0' '\1' < zeros > ones
> 
> Test framework failures during the above commands (and below)
> should be caught.

done

>> +for n in 1 2 3 4 32 $maxn; do
>> +  parts=$(expr $maxn / $n)
>> +
>> +  rm -f sparse.in
>> +
>> +  # Generate sparse file for copying with alternating
>> +  # hole/data patterns of size n * $hole_size
>> +  for i in $(yes zeros | sed 1~2s/zeros/ones/ | head -n$parts); do
>> +    dd iflag=fullblock if=$i of=sparse.in conv=notrunc oflag=append \
>> +       bs=$hole_size count=$n status=none || framework_failure_
>> +  done
> 
> a)
> Looking at the ls(1) output above, the generated sparse.in files are
> not sparse, are they? I think it doesn't matter, and in fact the file
> to copy first does not have to be sparse. This is what the second
> copying below is for. So just s/sparse/sample/ in the comment maybe.

Right the first file should not be sparse.
Otherwise we get only the fiemap code for handling holes.
Comment fixed.

> b)
> Instead of appending to the sample file with "of=sparse.in conv=notrunc"
> " oflag=append", you could just redirect the for-loop's output to that
> file.

maybe, though more data copying in that case

> c)
> I'm not sure if the sed command "1~2s" is available everywhere
> (TBH I've seen it the first time myself).
> What about avoiding the sed(1) call completely by moving the
> alternating logic into the yes(1) call?
> 
>   s="$(printf "%s\n%s" zeros ones)"
>   for i in $(yes "$s" |  head -n$parts); do
>     ...

Good point I need to make that portable.

>> +
>> +  cp --sparse=always sparse.in sparse.out   || fail=1 # non sparse input
>> +  cp --sparse=always sparse.out sparse.out2 || fail=1 # sparse input
>> +
>> +  cmp sparse.in sparse.out || fail=1
>> +  cmp sparse.in sparse.out2 || fail=1
>> +
>> +  ls -lsh sparse.*
> 
> Given the sparse-detection and -punching is optimal, then both
> output file should always have the same size, shouldn't they?
> Therefore, the test should check those numbers.

I'm very wary of checking the numbers as they could vary
due to alignment, or async reclamation on the file system.

>> +done
>> +
>>  Exit $fail
> 
> Although the test data seems to be quite good, I somehow
> have the feeling that it should be even more random.
> In the above, the blocks of NULs and 1s are always a multiple
> of $holesize ... which in turn makes it all somehow related
> to the buffer size and read size in sparse_copy().
> I think we should use varying block sizes around the block size
> for the NULs and 1s blocks to be closer to real-world data.
> WDYT?

Well going below the $holesize would only test is_nul(),
and that's implicitly tested with the data comparisons.
Oh I should change from \1 to 'U' since \1 is used as the
sentinel flag within the code.

I also noticed small issue in copy where an erroneous
error about extending the file would be output when
reporting errors when sparse_copy fails. That's fixed
up is a separate patch. All 4 patches are attached here.

thanks!
Pádraig.

Attachment: sparse-improvements.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]