bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6131: [PATCH]: fiemap support for efficient sparse file copy


From: Joel Becker
Subject: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 15:10:19 -0700
User-agent: Mutt/1.5.20 (2009-06-14)

On Tue, Jun 15, 2010 at 02:28:43PM -0700, Paul Eggert wrote:
> I looked at the proposed fiemap patch for cp and see another issue
> with it, which needs to be thought through more carefully.
> 
> The proposed fiemap code tries to copy the _physical_ holes in the
> source file.  But currently cp copies the _logical_ holes: that is, it
> skips over writing (and thus attempts to create physical holes)
> whenever the source file contains a block of zeros, regardless of
> whether that block of zeros is physically stored in the file system.
> The current cp behavior is often the desired behavior (particularly
> for cp --sparse=always) and the fiemap code shouldn't alter that.
> 
> For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag
> set, cp should treat that as a hole, because the extent is all zeros.
> (This will greatly help performance in some cases.)  Also, if an input
> extent is read and a block of it is found to be zeros, cp should skip
> over that block when writing.

        I pretty much agree with everything you wrote above.  Fiemap is
introduced into cp(1) to optimize the existing --sparse behavior.  So we
should be holding to that behavior.
        In fact, I'll go one further.  While I haven't looked at the
actual cp code in a little bit, I do not think that fiemap is used for
the non-sparse copy.  I contend that fiemap should be used for all
copies if the capability exists.  Even if cp is doing a non-sparse copy,
there's no need for it to call read(2) on a 1GB hole in the file.  It
can write zeros just fine on its own.
        Basically, these capabilities (fiemap, SEEK_HOLE) are ways to
avoid reading zeros we can determine without reading and scanning.  We
should always take advantage of that.

> If the proposed fiemap code was intended to copy the physical holes,
> then the code wasn't successful in implementing the intent, as there's
> no guarantee that the extents of the destination are the same as the
> extents of the source.  If it is useful to have 'cp' copy the physical
> holes, this could be supported via a new cp option or options, but
> surely the default should continue to be to copy logical holes.

        The goal was not physical holes, it was logical ones as you
surmise.  I can't see any way to ensure the file has identical extents.
I don't know any filesystem that guarantees such allocations on
write(2).
        I do think there might be an interesting performance
optimization to be had, but it is independent of fiemap.  Above, you
describe the usual "cp --sparse" behavior of reading allocated data and
checking for zeroes.  If you're optimizing for space usage -- what "cp
--sparse" is defined to do -- you definitely want to skip writing blocks
that are all zero.  But it actually creates a crappier allocation and
I/O pattern for some files.
        Imagine you have a data extent that is alternating blocks of
zeroes and ones:

hexdump -C /tmp/blk
00000000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01 |................|
*
00001000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
*
00002000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01 |................|
*
00003000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
*
00004000
... (continued for 4MB in this example)
00400000

Classic "cp --sparse" will spend CPU time scanning this extent, which
can be megabytes in size.  It will then write out every other block,
which is terribly slow, and it leavs a horribly fragmented destination
file.  The result is even more disastrous if those holes ever get filled
in.
        I'm, of course, exaggerating a worst-case here.  But folks
looking at I/O performance would love detection like "here's a linear
MB, sure it has some zero blocks, but it's faster to write the entire MB
than to chop it up."

Joel

-- 

Life's Little Instruction Book #456

        "Send your loved one flowers.  Think of a reason later."

Joel Becker
Principal Software Developer
Oracle
E-mail: address@hidden
Phone: (650) 506-8127





reply via email to

[Prev in Thread] Current Thread [Next in Thread]