qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [


From: Pádraig Brady
Subject: Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Thu, 21 Jul 2016 13:31:21 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 21/07/16 12:43, Dave Chinner wrote:
> On Wed, Jul 20, 2016 at 03:35:17PM +0200, Niels de Vos wrote:
>> On Wed, Jul 20, 2016 at 10:30:25PM +1000, Dave Chinner wrote:
>>> On Wed, Jul 20, 2016 at 05:19:37AM -0400, Paolo Bonzini wrote:
>>>> Adding ext4 and XFS guys (Lukas and Dave respectively).  As a quick recap, 
>>>> the
>>>> issue here is the semantics of FIEMAP and SEEK_HOLE/SEEK_DATA, which we 
>>>> use in
>>>> "qemu-img map".  This command prints metadata about a virtual disk 
>>>> image---which
>>>> in the case of a raw image amounts to detecting holes and unwritten 
>>>> extents.
>>>>
>>>> First, it seems like SEEK_HOLE and SEEK_DATA really should be 
>>>> "SEEK_NONZERO" and
>>>> "SEEK_ZERO", on both ext4 and XFS.    You can see that unwritten extents 
>>>> are
>>>> reported by "qemu-img map" as holes:
>>>
>>> Correctly so. seek hole/data knows nothing about the underlying
>>> filesystem storage implementation. A "hole" is defined as a region
>>> of zeros, and the filesystem is free call anything it kows for
>>> certain contains zeros as a hole.
>>>
>>> FYI, SEEK_HOLE/DATA were named after the the pre-existing Solaris
>>> seek API that uses these definitions for hole and data....
>>>
>>>>     $ dd if=/dev/urandom of=test.img bs=1M count=100
>>>>     $ fallocate -z -o 10M -l 10M test.img
>>>>     $ du -h test.img
>>>>     $ qemu-img map --output=json test.img
>>>>     [{ "start": 0, "length": 10485760, "depth": 0, "zero": false, "data": 
>>>> true, "offset": 0},
>>>>     { "start": 10485760, "length": 10485760, "depth": 0, "zero": true, 
>>>> "data": false, "offset": 10485760},
>>>>     { "start": 20971520, "length": 83886080, "depth": 0, "zero": false, 
>>>> "data": true, "offset": 20971520}]
>>>
>>>>
>>>> On the second line, zero=true data=false identifies a hole.  The right 
>>>> output
>>>> would either have zero=true data=true (unwritten extent) or just
>>>>
>>>>     [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data": 
>>>> true, "offset": 0},
>>>>
>>>> since the zero flag is advisory (it doesn't try to detect zeroes beyond 
>>>> what the
>>>> filesystem says).
>>>
>>> Not from SEEK_HOLE/SEEK_DATA. All it conveys is "this is the start
>>> of a range of zeros" and "this is the start of a range of data". And
>>> for filesystems that don't specifically implement these seek
>>> operations, zeros are considered data. i.e. SEEK_HOLE will take you
>>> to the end of file, SEEK_DATA returns the current position....
>>>
>>> i.e. unwritten extents contain no data, so they are semantically
>>> identical to holes for the purposes of seeking and hence SEEK_DATA
>>> can skip over them.
>>>
>>>> The reason why we disabled FIEMAP was a combination of a corruption and 
>>>> performance
>>>> issue.  The data corruption bug was at 
>>>> https://bugs.launchpad.net/qemu/+bug/1368815
>>>> and it was reported on Ubuntu Trusty (kernel 3.13 based on the release 
>>>> notes at
>>>> https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes).  We corrected that by 
>>>> using
>>>> FIEMAP_FLAG_SYNC, based on a similar patch to coreutils.  This turned out 
>>>> to be too
>>>> slow, so we dropped FIEMAP altogether.
>>>
>>> Yes, because FIEMAP output is only useful for diagnostic purposes as
>>> it can be stale even before the syscall returns to userspace. i.e.
>>> it can't be used in userspace for optimising file copies....
>>
>> Oh... And I was surprised to learn that "cp" does use FIEMAP and not
>> SEEK_HOLE/SEEK_DATA. You give a strong suggestion to fix that.
>>   http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/extent-scan.c#n87
> 
> My understanding was that FIEMAP was disabled almost immediately
> after the data corruption problems were understood, and it hasn't
> been turned back on since. I don't see FIEMAP calls in strace when I
> copy sparse files, even when I use --sparse=auto which is supposed
> to trigger the sparse source file checks using FIEMAP.
> 
> So, yeah, while there's FIEMAP code present, that doesn't mean it's
> actually used. And, yes, there are comments in coreutils commits in
> 2011 saying that the FIEMAP workarounds are temporary until
> SEK_HOLE/DATA are supported, which were added to the kernel in 2011
> soon after the 'cp-corrupts-data-with-fiemap' debacle came to light.
> Five years later, and the fiemap code is stil there?

Yes it's still there, and hasn't caused issues.
It's used only for sparse files:

  $ truncate -s1G t.fiemap
  $ strace -e ioctl cp t.fiemap t2.fiemap
  ioctl(3, FS_IOC_FIEMAP, 0x7ffff007c910) = 0

It's a pity fiemap is racy (on some file systems?) wrt reporting
of data not yet written out, and is thus fairly useless IMHO
without the FIEMAP_FLAG_SYNC workaround.

We _are_ planning to use SEEK_HOLE in future.

It would be nice though to have a way to distinguish zeros from holes
to efficiently/consistently maintain allocations to avoid future ENOSPC
or inefficient future allocations.

thanks,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]