[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Strange behavior of qemu-img map: zero/data status depends on fallocated
From: |
Nir Soffer |
Subject: |
Strange behavior of qemu-img map: zero/data status depends on fallocated image page cache content |
Date: |
Sun, 30 Jun 2024 17:31:58 +0300 |
I found a strange behavior in qemu-img map - zero/data status depends on page
cache content. It looks like a kernel issue since qemu-img map is using
SEEK_HOLE/DATA (block/file-posix.c line 3111).
Tested with latest qemu on kernel 6.9.6-100.fc39.x86_64. I see similar behavior
in xfs and ex4 filesystems.
After creating a allocated image:
# qemu-img create -f raw -o preallocation=falloc falloc.img 1g
Formatting 'falloc.img', fmt=raw size=1073741824 preallocation=falloc
qemu-img map reports the image as sparse (expect the first block which we fully
allocate):
# qemu-img map --output json falloc.img
[{ "start": 0, "length": 4096, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
{ "start": 4096, "length": 1073737728, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 4096}]
This is goo for copy or read performance, since we can skip reading the areas
with data=false, but on the other hand this is bad for correctness, since we
cannot preserve the allocation of the entire image, since it look like a sparse
image:
# qemu-img create -f raw sparse.img 1g
Formatting 'sparse.img', fmt=raw size=1073741824
# qemu-img map --output json sparse.img
[{ "start": 0, "length": 4096, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
{ "start": 4096, "length": 1073737728, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 4096}]
But look what happens when we get some of the image into the page cache:
# dd if=falloc.img bs=1M count=512 of=/dev/null
# qemu-img map --output json falloc.img
[{ "start": 0, "length": 544210944, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
{ "start": 544210944, "length": 529530880, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 544210944}]
Now half of the image is reported as data=true and half as data=false. If we
read the entire image all of it is reported as data=true:
# dd if=falloc.img bs=1M count=1024 of=/dev/null
# qemu-img map --output json falloc.img
[{ "start": 0, "length": 1073741824, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0}]
If we drop caches, the image go back to the initial state (almost):
# sync; echo 1 > /proc/sys/vm/drop_caches
# qemu-img map --output json falloc.img
[{ "start": 0, "length": 16384, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
{ "start": 16384, "length": 1073725440, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 16384}]
Based on the lseek(2) the file system can do anything, but the page
cache is not mentioned
as something that may affect the result of the call:
Seeking file data and holes
Since Linux 3.1, Linux supports the following additional values for
whence:
SEEK_DATA
Adjust the file offset to the next location in the file greater
than or equal to offset containing data. If offset points to
data, then the file offset is set to offset.
SEEK_HOLE
Adjust the file offset to the next hole in the file greater than
or equal to offset. If offset points into the middle of a hole,
then the file offset is set to offset. If there is no hole past
offset, then the file offset is adjusted to the end of the file
(i.e., there is an implicit hole at the end of any file).
In both of the above cases, lseek() fails if offset points past the end
of the file.
These operations allow applications to map holes in a sparsely allo‐
cated file. This can be useful for applications such as file backup
tools, which can save space when creating backups and preserve holes,
if they have a mechanism for discovering holes.
For the purposes of these operations, a hole is a sequence of zeros
that (normally) has not been allocated in the underlying file storage.
However, a filesystem is not obliged to report holes, so these opera‐
tions are not a guaranteed mechanism for mapping the storage space ac‐
tually allocated to a file. (Furthermore, a sequence of zeros that ac‐
tually has been written to the underlying storage may not be reported
as a hole.) In the simplest implementation, a filesystem can support
the operations by making SEEK_HOLE always return the offset of the end
of the file, and making SEEK_DATA always return offset (i.e., even if
the location referred to by offset is a hole, it can be considered to
consist of data that is a sequence of zeros).
On xfs filesystem we can inspect the actual allocation:
$ xfs_bmap -v falloc.img
falloc.img:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 192..199 0 (192..199) 8
1: [8..2097151]: 200..2097343 0 (200..2097343) 2097144
$ xfs_bmap -v sparse.img
sparse.img:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 2097344..2097351 0 (2097344..2097351) 8
1: [8..2047]: 2097352..2099391 0 (2097352..2099391) 2040
2: [2048..2097151]: hole 2095104
Maybe qemu-img should use file system specific APIs like ioctl_xfs_getbmap(2)
to get more correct and consistent allocation info?
Nir
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Strange behavior of qemu-img map: zero/data status depends on fallocated image page cache content,
Nir Soffer <=