bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Detection of sparse files is broken on btrfs


From: Andreas Dilger
Subject: Re: [Bug-tar] Detection of sparse files is broken on btrfs
Date: Thu, 18 Jan 2018 16:44:54 -0700

On Jan 17, 2018, at 9:49 PM, Tim Kientzle <address@hidden> wrote:
>> On Jan 17, 2018, at 1:09 PM, Andreas Dilger <address@hidden> wrote:
>> 
>>> So is there some other way to quickly identify sparse files so we can avoid 
>>> the SEEK_HOLE scan for non-sparse files?
>> 
>> Given that calling SEEK_HOLE is also going to have some cost, my suggestion
>> would be to ignore st_blocks completely for small files (size < 64KB) and
>> just read the file in this case, since the overhead of reading will probably
>> be about the same as checking if any blocks are allocated.  If no blocks are
>> allocated, then the read will not do any disk IO, and if there are blocks
>> allocated they would have needed to be read from disk anyway and SEEK_HOLE
>> would be pure overhead.
> 
> If I understand, you’re basically suggesting not bothering
> with the sparse-file storage for small files (size < 64k).

I'm not suggesting that small files should skip sparse handling completely,
just that there shouldn't be a heuristic that tries to avoid reading them
from disk, especially if that heuristic itself imposes a non-zero overhead.

Once the file is read, it would still be possible to do hole detection
if --sparse is used (i.e. check for zero blocks via memcmp()).  That
should be relatively low overhead since the file is small, and the time
will still largely be dominated by IO operations.

Cheers, Andreas

> This makes a lot of sense to me:  sparse file storage
> is most important for applications (e.g., databases) that
> use large files as randomly-accessible swap space and need
> to preserve sparseness (to not blow out disk) and/or
> non-sparseness (so that overwrites don’t require
> allocations).
> 
> So skipping the SEEK_HOLE check for common
> small files seems like a good optimization.
> 
> Cheers,
> 
> Tim




Attachment: signature.asc
Description: Message signed with OpenPGP


reply via email to

[Prev in Thread] Current Thread [Next in Thread]