qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Strange behavior of qemu-img map: zero/data status depends on fallocated


From: Nir Soffer
Subject: Strange behavior of qemu-img map: zero/data status depends on fallocated image page cache content
Date: Sun, 30 Jun 2024 17:31:58 +0300

I found a strange behavior in qemu-img map - zero/data status depends on page
cache content.  It looks like a kernel issue since qemu-img map is using
SEEK_HOLE/DATA (block/file-posix.c line 3111).

Tested with latest qemu on kernel 6.9.6-100.fc39.x86_64. I see similar behavior
in xfs and ex4 filesystems.

After creating a allocated image:

    # qemu-img create -f raw -o preallocation=falloc falloc.img 1g
    Formatting 'falloc.img', fmt=raw size=1073741824 preallocation=falloc

qemu-img map reports the image as sparse (expect the first block which we fully
allocate):

    # qemu-img map --output json falloc.img
    [{ "start": 0, "length": 4096, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
    { "start": 4096, "length": 1073737728, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 4096}]

This is goo for copy or read performance, since we can skip reading the areas
with data=false, but on the other hand this is bad for correctness, since we
cannot preserve the allocation of the entire image, since it look like a sparse
image:

    # qemu-img create -f raw sparse.img 1g
    Formatting 'sparse.img', fmt=raw size=1073741824

    # qemu-img map --output json sparse.img
    [{ "start": 0, "length": 4096, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
    { "start": 4096, "length": 1073737728, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 4096}]

But look what happens when we get some of the image into the page cache:

    # dd if=falloc.img bs=1M count=512 of=/dev/null

    # qemu-img map --output json falloc.img
    [{ "start": 0, "length": 544210944, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
    { "start": 544210944, "length": 529530880, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 544210944}]

Now half of the image is reported as data=true and half as data=false. If we
read the entire image all of it is reported as data=true:

    # dd if=falloc.img bs=1M count=1024 of=/dev/null

    # qemu-img map --output json falloc.img
    [{ "start": 0, "length": 1073741824, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0}]

If we drop caches, the image go back to the initial state (almost):

    # sync; echo 1 > /proc/sys/vm/drop_caches

    # qemu-img map --output json falloc.img
    [{ "start": 0, "length": 16384, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 0},
    { "start": 16384, "length": 1073725440, "depth": 0, "present":
true, "zero": true, "data": false, "offset": 16384}]

Based on the lseek(2) the file system can do anything, but the page
cache is not mentioned
as something that may affect the result of the call:

   Seeking file data and holes
       Since  Linux  3.1,  Linux  supports the following additional values for
       whence:

       SEEK_DATA
              Adjust the file offset to the next location in the file  greater
              than  or  equal  to offset containing data.  If offset points to
              data, then the file offset is set to offset.

       SEEK_HOLE
              Adjust the file offset to the next hole in the file greater than
              or equal to offset.  If offset points into the middle of a hole,
              then the file offset is set to offset.  If there is no hole past
              offset, then the file offset is adjusted to the end of the  file
              (i.e., there is an implicit hole at the end of any file).

       In both of the above cases, lseek() fails if offset points past the end
       of the file.

       These  operations  allow  applications to map holes in a sparsely allo‐
       cated file.  This can be useful for applications such  as  file  backup
       tools,  which  can save space when creating backups and preserve holes,
       if they have a mechanism for discovering holes.

       For the purposes of these operations, a hole is  a  sequence  of  zeros
       that  (normally) has not been allocated in the underlying file storage.
       However, a filesystem is not obliged to report holes, so  these  opera‐
       tions  are not a guaranteed mechanism for mapping the storage space ac‐
       tually allocated to a file.  (Furthermore, a sequence of zeros that ac‐
       tually has been written to the underlying storage may not  be  reported
       as  a  hole.)  In the simplest implementation, a filesystem can support
       the operations by making SEEK_HOLE always return the offset of the  end
       of  the  file, and making SEEK_DATA always return offset (i.e., even if
       the location referred to by offset is a hole, it can be  considered  to
       consist of data that is a sequence of zeros).

On xfs filesystem we can inspect the actual allocation:

    $ xfs_bmap -v falloc.img
    falloc.img:
     EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
       0: [0..7]:          192..199          0 (192..199)             8
       1: [8..2097151]:    200..2097343      0 (200..2097343)   2097144

    $ xfs_bmap -v sparse.img
    sparse.img:
     EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET            TOTAL
       0: [0..7]:          2097344..2097351  0 (2097344..2097351)       8
       1: [8..2047]:       2097352..2099391  0 (2097352..2099391)    2040
       2: [2048..2097151]: hole                                   2095104

Maybe qemu-img should use file system specific APIs like ioctl_xfs_getbmap(2)
to get more correct and consistent allocation info?


Nir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]