qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [


From: Pádraig Brady
Subject: Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Thu, 21 Jul 2016 14:01:28 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 21/07/16 13:41, Dave Chinner wrote:
> On Wed, Jul 20, 2016 at 09:40:06AM -0400, Paolo Bonzini wrote:
>>>> 1) is it expected that SEEK_HOLE skips unwritten extents?
>>>
>>> There are multiple answers to this, all of which are correct depending
>>> on current context and state:
>>>
>>> 1. No - some filesystems will report clean unwritten extents as holes.
>>>
>>> 2. Yes - some filesystems will report clean unwritten extents as data.
>>>
>>> 3.  Maybe - if there is written data in memory over the unwritten
>>> extent on disk (i.e. hasn't been flushed to disk, it will be
>>> considered a data region with non-zero data. (FIEMAP will still
>>> report is as unwritten)
>>
>> Ok, I thought it would return FIEMAP_EXTENT_UNKNOWN|FIEMAP_EXTENT_DELALLOC
>> in this case (not FIEMAP_EXTENT_UNWRITTEN).
> 
> No. FIEMAP only returns the known extent state at the given file
> offset.  "delalloc" extents exist in memory, indicating the space
> has already been accounted for over that offset, but the extent has
> not been physically allocated. Like all other types of extents,
> there may or may not be valid data over a delayed allocation extent. 
> 
> IOWs, fiemap only gives you a snapshot of extent state, not the
> ranges of valid data in the file.
> 
>>>> If not, would
>>>> it be acceptable to introduce Linux-specific SEEK_ZERO/SEEK_NONZERO, which
>>>> would be similar to what SEEK_HOLE/SEEK_DATA do now?
>>>
>>> To solve what problem? You haven't explained what problem you are
>>> trying to solve yet.
>>>
>>>> 2) for FIEMAP do we really need FIEMAP_FLAG_SYNC?  And if not, for what
>>>> filesystems and kernel releases is it really not needed?
>>>
>>> I can't answer this question, either, because I don't know what
>>> you want the fiemap information for.
>>
>> The answer is the same no matter if we use both lseek and FIEMAP, so
>> I'll answer just once.  We want to do two things:
>>
>> 1) avoid copying zero data, to keep the copy process efficient.  For this,
>> SEEK_HOLE/SEEK_DATA are enough.
>>
>> 2) copy file contents while preserving the allocation state of the file's 
>> extents.
> 
> Which is /very difficult/ to do safely and reliably.
> 
> We do actually do reliable, safe, exact hole and preallocation
> layout duplication with xfs_fsr, but that uses kernel provided
> cookies (from XFS_IOC_BULKSTAT) to detect that data in the source
> file has not changed while it was being copied before executing the
> final defrag operation in the kernel (XFS_IOC_SWAPEXT) that makes
> the new copy of the data user visible.
> 
> i.e. the use of fiemap to duplicate the exact layout of a file
> from userspace is only posisble if you can /guarantee/ the source
> file has not changed in any way during the copy operation at the
> pointin time you finalise the destination data copy.
> 
>> There can be various reasons why the user has preallocated the file (because 
>> they
>> don't want an ENOSPC to happen while the VM runs; on some filesystems, to
>> minimize cases where io_submit is very un-asynchronous; or just because 
>> someone
>> had a reason to do a BLKZEROOUT ioctl on the virtual disk).  We want to 
>> preserve
>> these while converting or otherwise moving the file around.
> 
> Sure, there's many reasons for using prealloc/punch/zero. The real
> difference to other file operations is that they interface with low
> level filesystem structure, not the data contained within the
> extents. That's what makes them problematic for duplication -
> userspace cannot serialise against low level filesystem structure
> modifications.
> 
> Optimising file copies safely is one of the reasons the
> copy_file_range() syscall has been introduced (in 4.5). While we
> haven't implemented anything special in XFS yet, it will internally
> use splice to do a zero-copy data transfer from source to
> destination file. Optimising for exact layout copies is precisely
> the sort of thing this syscall is intended for.
> 
> It's also intended to enable applications to take advantage of
> hardware acceleration of data copying (e.g. server side copies to
> avoid round trips as has been implemented for NFS, or storage array
> offload of data copying) when such support is provided by the kernel.
> 
> IOWs, I think you should be looking to optimise file copies by using
> copy_file_range() and getting filesystems to do exactly what you
> need. Using FIEMAP, fallocate and moving data through userspace
> won't ever be reliable without special filesystem help (that only
> exists for XFS right now), nor will it enable the application to
> transparently use smart storage protocols and hardware when it is
> present on user systems....

Yes higher level calls are useful here and we'll consider using them in cp etc.
When I previously looked at this I noticed some implementations would
fall back to do_splice_direct() which is essentially sendfile()
and that expands holes which wouldn't be a good default.
So there may be soem need for control flags for copy_file_range()
to have it generally useful.

thanks for the info,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]