bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Detection of sparse files is broken on btrfs


From: Adam Borowski
Subject: Re: [Bug-tar] Detection of sparse files is broken on btrfs
Date: Mon, 8 Jan 2018 17:44:11 +0100
User-agent: NeoMutt/20170113 (1.7.2)

On Mon, Jan 08, 2018 at 04:28:36PM +0100, Joerg Schilling wrote:
> Tim Kientzle <address@hidden> wrote:
> 
> > I'm not entirely sure I understand the above.
> >
> > It sounds like someone is claiming that:
> >
> > * Archiving programs should know about the *timing* of filesystem
> >   implementations  (60s here for btrfs, something else for <new
> >   filesystem XYZ>?)
> >
> > * And specifically request the OS to fsync() files before trusting the 
> > metadata
> 
> This is exactly the reason, why btrfs (in case it behaves as claimed) seems to
> be be in conflict with POSIX.
> 
> POSIX requires that stat() returns cached meta data instead of probably out 
> of 
> date information from the background medium. In other words: It is not 
> allowed to return different data before and after a sync() or fsync() call.

Are we talking about
http://pubs.opengroup.org/onlinepubs/9699919799/functions/stat.html and
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html ?

I can't see any such requirement there.

Also, that "cached meta data", as you want, can't possibly work on any
modern filesystem, not just btrfs.  Some fields (such as timestamps) have
a specified behaviour, but where not explicitly required, the filesystem is
allowed to not copy implementation quirks of how it was done in 1970.

For example, I see no place that mandates, eg, hard link count of a
directory to be (subdir count+2) -- in fact, the standard, per my reading,
wants this value to be always 1 (unless directory hard links are supported,
which is not the case on any modern local filesystem I'm aware of).

st_dev and st_ino, required by POSIX, are already problematic enough and
need to be synthesised by filesystems with bogus values, and block
innovation (such as, eg, directory reflinks).  No need to add extra
requirements which aren't there.

Even filesystems that are direct descendants of sysvfs, such as ext4, don't
know where a file will be placed until sync/writeout time, because of
delalloc.  Add compression or CoW, and there's no way to know this other
than having stat() force allocation and compression, and return only after
everything but physical writeout is done.  On a networked filesystem, the
entirety of fsync would need to be done.

Here's an example:
A dense file, 128KB big, is well-compressible, and uses a 24KB extent on the
disk.  You write a 32KB piece to the middle of the file, it compresses to
12KB.  What is the count of used blocks?

The whole original extent is pinned; there's no meaningful way to tell how
much it takes -- calculating that would require reading the whole thing,
recompressing, and writing again.  Pretty wasteful especially if there are
other reflinks to that extent.  Thus, is the answer 128KB, 160KB, 56KB, or
24KB+whatever value you'd get from recompressing the 96kb piece?

Back to the original question:
how do you know if, upon writing a 128KB file, it'll compress to 128KB
(using real extents) or to 10 bytes, storeable in metadata tree?  stat()
would have to pointlessly block and allocate -- especially wasteful if the
file is still being written to.

Thus: a valid argument may be "POSIX describes this" but not "sysvfs did
so half a century ago".

A file that doesn't have a single block allocated for it may thus return
st_blocks of 0, no matter if it's empty or not.


Meow!
-- 
// If you believe in so-called "intellectual property", please immediately
// cease using counterfeit alphabets.  Instead, contact the nearest temple
// of Amon, whose priests will provide you with scribal services for all
// your writing needs, for Reasonable And Non-Discriminatory prices.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]