[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [
From: |
Pádraig Brady |
Subject: |
Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server) |
Date: |
Thu, 21 Jul 2016 13:31:21 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 21/07/16 12:43, Dave Chinner wrote:
> On Wed, Jul 20, 2016 at 03:35:17PM +0200, Niels de Vos wrote:
>> On Wed, Jul 20, 2016 at 10:30:25PM +1000, Dave Chinner wrote:
>>> On Wed, Jul 20, 2016 at 05:19:37AM -0400, Paolo Bonzini wrote:
>>>> Adding ext4 and XFS guys (Lukas and Dave respectively). As a quick recap,
>>>> the
>>>> issue here is the semantics of FIEMAP and SEEK_HOLE/SEEK_DATA, which we
>>>> use in
>>>> "qemu-img map". This command prints metadata about a virtual disk
>>>> image---which
>>>> in the case of a raw image amounts to detecting holes and unwritten
>>>> extents.
>>>>
>>>> First, it seems like SEEK_HOLE and SEEK_DATA really should be
>>>> "SEEK_NONZERO" and
>>>> "SEEK_ZERO", on both ext4 and XFS. You can see that unwritten extents
>>>> are
>>>> reported by "qemu-img map" as holes:
>>>
>>> Correctly so. seek hole/data knows nothing about the underlying
>>> filesystem storage implementation. A "hole" is defined as a region
>>> of zeros, and the filesystem is free call anything it kows for
>>> certain contains zeros as a hole.
>>>
>>> FYI, SEEK_HOLE/DATA were named after the the pre-existing Solaris
>>> seek API that uses these definitions for hole and data....
>>>
>>>> $ dd if=/dev/urandom of=test.img bs=1M count=100
>>>> $ fallocate -z -o 10M -l 10M test.img
>>>> $ du -h test.img
>>>> $ qemu-img map --output=json test.img
>>>> [{ "start": 0, "length": 10485760, "depth": 0, "zero": false, "data":
>>>> true, "offset": 0},
>>>> { "start": 10485760, "length": 10485760, "depth": 0, "zero": true,
>>>> "data": false, "offset": 10485760},
>>>> { "start": 20971520, "length": 83886080, "depth": 0, "zero": false,
>>>> "data": true, "offset": 20971520}]
>>>
>>>>
>>>> On the second line, zero=true data=false identifies a hole. The right
>>>> output
>>>> would either have zero=true data=true (unwritten extent) or just
>>>>
>>>> [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data":
>>>> true, "offset": 0},
>>>>
>>>> since the zero flag is advisory (it doesn't try to detect zeroes beyond
>>>> what the
>>>> filesystem says).
>>>
>>> Not from SEEK_HOLE/SEEK_DATA. All it conveys is "this is the start
>>> of a range of zeros" and "this is the start of a range of data". And
>>> for filesystems that don't specifically implement these seek
>>> operations, zeros are considered data. i.e. SEEK_HOLE will take you
>>> to the end of file, SEEK_DATA returns the current position....
>>>
>>> i.e. unwritten extents contain no data, so they are semantically
>>> identical to holes for the purposes of seeking and hence SEEK_DATA
>>> can skip over them.
>>>
>>>> The reason why we disabled FIEMAP was a combination of a corruption and
>>>> performance
>>>> issue. The data corruption bug was at
>>>> https://bugs.launchpad.net/qemu/+bug/1368815
>>>> and it was reported on Ubuntu Trusty (kernel 3.13 based on the release
>>>> notes at
>>>> https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes). We corrected that by
>>>> using
>>>> FIEMAP_FLAG_SYNC, based on a similar patch to coreutils. This turned out
>>>> to be too
>>>> slow, so we dropped FIEMAP altogether.
>>>
>>> Yes, because FIEMAP output is only useful for diagnostic purposes as
>>> it can be stale even before the syscall returns to userspace. i.e.
>>> it can't be used in userspace for optimising file copies....
>>
>> Oh... And I was surprised to learn that "cp" does use FIEMAP and not
>> SEEK_HOLE/SEEK_DATA. You give a strong suggestion to fix that.
>> http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/extent-scan.c#n87
>
> My understanding was that FIEMAP was disabled almost immediately
> after the data corruption problems were understood, and it hasn't
> been turned back on since. I don't see FIEMAP calls in strace when I
> copy sparse files, even when I use --sparse=auto which is supposed
> to trigger the sparse source file checks using FIEMAP.
>
> So, yeah, while there's FIEMAP code present, that doesn't mean it's
> actually used. And, yes, there are comments in coreutils commits in
> 2011 saying that the FIEMAP workarounds are temporary until
> SEK_HOLE/DATA are supported, which were added to the kernel in 2011
> soon after the 'cp-corrupts-data-with-fiemap' debacle came to light.
> Five years later, and the fiemap code is stil there?
Yes it's still there, and hasn't caused issues.
It's used only for sparse files:
$ truncate -s1G t.fiemap
$ strace -e ioctl cp t.fiemap t2.fiemap
ioctl(3, FS_IOC_FIEMAP, 0x7ffff007c910) = 0
It's a pity fiemap is racy (on some file systems?) wrt reporting
of data not yet written out, and is thus fairly useless IMHO
without the FIEMAP_FLAG_SYNC workaround.
We _are_ planning to use SEEK_HOLE in future.
It would be nice though to have a way to distinguish zeros from holes
to efficiently/consistently maintain allocations to avoid future ENOSPC
or inefficient future allocations.
thanks,
Pádraig
- Re: [Qemu-block] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server, (continued)
- Re: [Qemu-block] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server, Eric Blake, 2016/07/19
- Re: [Qemu-block] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server, Fam Zheng, 2016/07/20
- Re: [Qemu-block] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server, Paolo Bonzini, 2016/07/20
- Re: [Qemu-block] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server, Fam Zheng, 2016/07/20
- Re: [Qemu-block] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server, Paolo Bonzini, 2016/07/20
- Re: [Qemu-block] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server, Fam Zheng, 2016/07/20
- [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Paolo Bonzini, 2016/07/20
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Dave Chinner, 2016/07/20
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Niels de Vos, 2016/07/20
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Dave Chinner, 2016/07/21
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server),
Pádraig Brady <=
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Dave Chinner, 2016/07/21
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Paolo Bonzini, 2016/07/20
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Dave Chinner, 2016/07/21
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Pádraig Brady, 2016/07/21
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Paolo Bonzini, 2016/07/21
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Dave Chinner, 2016/07/22
- Re: [Qemu-block] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server), Paolo Bonzini, 2016/07/22
[Qemu-block] [PATCH v5 10/14] nbd: Less allocation during NBD_OPT_LIST, Eric Blake, 2016/07/19
[Qemu-block] [PATCH v5 12/14] nbd: Improve server handling of shutdown requests, Eric Blake, 2016/07/19
[Qemu-block] [PATCH v5 07/14] nbd: Share common option-sending code in client, Eric Blake, 2016/07/19