bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6131: [PATCH]: fiemap support for efficient sparse file copy


From: Pádraig Brady
Subject: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 15 Jul 2010 00:51:36 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 14/07/10 18:45, Paul Eggert wrote:
>>> I see fiemap just as a way to efficiently detect/read holes,
>>> and should have no bearing on the destination.
> 
> Hmm, but the proposal quoted below would mean that fiemap does have a
> bearing on the destination, in the --sparse=auto case.
> I guess this is OK, but it should be documented.
> 
>>> cp --sparse=auto (this is currently what cp does by default)
>>>   recreate the original fiemap holes or resort to existing
>>>   heuristic if fiemap not available
> 
> It's not just fiemap.  It's also the Solaris interface with SEEK_HOLE
> and SEEK_DATA.  The change should involve a module that isolates these
> low-level details from copy.c.  copy.c should ask the new module for the
> locations of the holes (or the non-holes: that could be more convenient).
> On traditional hosts without fiemap or SEEK_DATA, the module should report
> that it doesn't know where the holes are; this can let copy.c resort to
> the existing heuristic of looking at the size and the disk usage and
> using the --sparse=always approach if the file "smells" like it's sparse.
> 
>>> cp --sparse=never
>>>   write all data, but use fiemap if available to efficiently read
> 
> Surely there's no need to write all the data if fallocate works.
> 
>>> cp --sparse=always
>>>   recreate original holes and perhaps extend add to them for
>>>   other runs of zero bytes. Without having looked at the code
>>>   I see this as a little tricky to mix with fiemap.
>>>   Now since fiemap is only an optimization we can skip it
>>>   completely for this uncommon case if too tricky (just add a FIXME for 
>>> now).
> 
> Yes, that makes sense.  --sparse=always should never invoke fallocate.
> 
>> For 'cp --sparse=never', when detected holes from SRC file, do not lseek(2) 
>> against DST file,
>> instead, write ZEROs to DST file, Am I right?
> 
> Only if fallocate doesn't work.  If fallocate works, there's no need
> to write zeros to the destination.

What you're describing here is posix_fallocate()
which uses fallocate() if available or falls back
to an implementation that writes a single 0 byte
to each block.

> 
>> 2. Performance optimization, invoke fallocate(2) if an extent flag is 
>> UNWRITTEN
> 
> This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
> so it should act as if it were a hole.  The goal is not to copy the exact
> fiemap structure of the source (that's impossible): the goal is to use as
> little time and space as possible.
> 
>> If you decide to do that, then please do it as a separate patch.
> 
> It's not clear to me that the fiemap stuff can be cleanly separated
> from the fallocate stuff.  To some extent they're the same issue.
> If they can easily be separated, that's better of course.

I see fiemap as optimizing reads,
posix_fallocate() as optimizing writing zeros
and fallocate() as optimizing allocation.

So not having thought much about implementation details,
it seems like they could be logically separated.
I.E. we could optimize the writing zeros and allocation
later when we have the fallocate and posix_fallocate
gnulib modules in place.

In saying that, doing both now is better
when these details are in everyone's minds.
I'll not get to resubmitting my fallocate gnulib patch,
or doing a posix_fallocate module, this week at least I think.

cheers,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]