qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [


From: Dave Chinner
Subject: Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Thu, 21 Jul 2016 22:41:19 +1000
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Jul 20, 2016 at 09:40:06AM -0400, Paolo Bonzini wrote:
> > > 1) is it expected that SEEK_HOLE skips unwritten extents?
> > 
> > There are multiple answers to this, all of which are correct depending
> > on current context and state:
> > 
> > 1. No - some filesystems will report clean unwritten extents as holes.
> > 
> > 2. Yes - some filesystems will report clean unwritten extents as data.
> > 
> > 3.  Maybe - if there is written data in memory over the unwritten
> > extent on disk (i.e. hasn't been flushed to disk, it will be
> > considered a data region with non-zero data. (FIEMAP will still
> > report is as unwritten)
> 
> Ok, I thought it would return FIEMAP_EXTENT_UNKNOWN|FIEMAP_EXTENT_DELALLOC
> in this case (not FIEMAP_EXTENT_UNWRITTEN).

No. FIEMAP only returns the known extent state at the given file
offset.  "delalloc" extents exist in memory, indicating the space
has already been accounted for over that offset, but the extent has
not been physically allocated. Like all other types of extents,
there may or may not be valid data over a delayed allocation extent. 

IOWs, fiemap only gives you a snapshot of extent state, not the
ranges of valid data in the file.

> > > If not, would
> > > it be acceptable to introduce Linux-specific SEEK_ZERO/SEEK_NONZERO, which
> > > would be similar to what SEEK_HOLE/SEEK_DATA do now?
> > 
> > To solve what problem? You haven't explained what problem you are
> > trying to solve yet.
> > 
> > > 2) for FIEMAP do we really need FIEMAP_FLAG_SYNC?  And if not, for what
> > > filesystems and kernel releases is it really not needed?
> > 
> > I can't answer this question, either, because I don't know what
> > you want the fiemap information for.
> 
> The answer is the same no matter if we use both lseek and FIEMAP, so
> I'll answer just once.  We want to do two things:
> 
> 1) avoid copying zero data, to keep the copy process efficient.  For this,
> SEEK_HOLE/SEEK_DATA are enough.
> 
> 2) copy file contents while preserving the allocation state of the file's 
> extents.

Which is /very difficult/ to do safely and reliably.

We do actually do reliable, safe, exact hole and preallocation
layout duplication with xfs_fsr, but that uses kernel provided
cookies (from XFS_IOC_BULKSTAT) to detect that data in the source
file has not changed while it was being copied before executing the
final defrag operation in the kernel (XFS_IOC_SWAPEXT) that makes
the new copy of the data user visible.

i.e. the use of fiemap to duplicate the exact layout of a file
from userspace is only posisble if you can /guarantee/ the source
file has not changed in any way during the copy operation at the
pointin time you finalise the destination data copy.

> There can be various reasons why the user has preallocated the file (because 
> they
> don't want an ENOSPC to happen while the VM runs; on some filesystems, to
> minimize cases where io_submit is very un-asynchronous; or just because 
> someone
> had a reason to do a BLKZEROOUT ioctl on the virtual disk).  We want to 
> preserve
> these while converting or otherwise moving the file around.

Sure, there's many reasons for using prealloc/punch/zero. The real
difference to other file operations is that they interface with low
level filesystem structure, not the data contained within the
extents. That's what makes them problematic for duplication -
userspace cannot serialise against low level filesystem structure
modifications.

Optimising file copies safely is one of the reasons the
copy_file_range() syscall has been introduced (in 4.5). While we
haven't implemented anything special in XFS yet, it will internally
use splice to do a zero-copy data transfer from source to
destination file. Optimising for exact layout copies is precisely
the sort of thing this syscall is intended for.

It's also intended to enable applications to take advantage of
hardware acceleration of data copying (e.g. server side copies to
avoid round trips as has been implemented for NFS, or storage array
offload of data copying) when such support is provided by the kernel.

IOWs, I think you should be looking to optimise file copies by using
copy_file_range() and getting filesystems to do exactly what you
need. Using FIEMAP, fallocate and moving data through userspace
won't ever be reliable without special filesystem help (that only
exists for XFS right now), nor will it enable the application to
transparently use smart storage protocols and hardware when it is
present on user systems....

Cheers,

Dave.
-- 
Dave Chinner
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]