qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] block_status automatically added flags


From: Eric Blake
Subject: Re: [Qemu-block] block_status automatically added flags
Date: Tue, 13 Feb 2018 12:48:47 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 02/13/2018 11:36 AM, Vladimir Sementsov-Ogievskiy wrote:
Hi Eric!

I'm now testing my nbd block status realization (block_status part, not about dirty bitmaps), and faced into the following effect.

I created empty qcow2 image and wrote to the first sector, so

qemu-io -c map x

reports:

64 KiB (0x10000) bytes     allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f0000) bytes not allocated at offset 64 KiB (0x10000)

But I can't get same results, when connecting to nbd server, exporting the same qcow2 file, I get

10 MiB (0xa00000) bytes     allocated at offset 0 bytes (0x0)

Is this with or without your NBD_CMD_BLOCK_STATUS patches applied? And are you exposing the data over NBD as raw ('qemu-nbd -f qcow2'/'qemu-io -f raw') or as qcow2 ('qemu-nbd -f raw'/'qemu-io -f qcow2')?

/me does a quick reproduction

Yes, I definitely see that behavior without any NBD_CMD_BLOCK_STATUS patches and when the image is exposed over NBD as raw, but not when exposed as qcow2, when testing the 2.11 release:

$ qemu-img create -f qcow2 file3 10M
Formatting 'file3', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ qemu-io -c 'w 0 64k' -c map -f qcow2 file3
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0579 sec (1.079 MiB/sec and 17.2601 ops/sec)
64 KiB (0x10000) bytes     allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f0000) bytes not allocated at offset 64 KiB (0x10000)
$ qemu-nbd -f qcow2 -x foo file3
$ qemu-io -f raw -c map nbd://localhost:10809/foo
10 MiB (0xa00000) bytes     allocated at offset 0 bytes (0x0)
$ qemu-nbd -f raw -x foo file3
$ qemu-io -f qcow2 -c map nbd://localhost:10809/foo
64 KiB (0x10000) bytes     allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f0000) bytes not allocated at offset 64 KiB (0x10000)

Right now, without NBD block status, the NBD driver reports the entire file as allocated, as it can't do any better (NBD has no backing file, and all data . Presumably, once NBD_CMD_BLOCK_STATUS is implemented, we can then use that for more accurate information.



Finally, I understand the reason:

for local file, qemu-io calls bdrv_is_allocated, which calls bdrv_common_block_status_above with want_zero=false. So, it doesn't set BDRV_BLOCK_ZERO, and doesn't set BDRV_BLOCK_ALLOCATED.

'qemu-io map' is a bit unusual; it is the only UI that easily exposes bdrv_is_allocated() to the outside world ('qemu-img map' does not). (The fact that both operations are named 'map' but do something different is annoying; for back-compat reasons, we can't change qemu-img, and I don't know if changing qemu-io is worth it.)


And, even if we change want_zero to true,

Well, you'd do that by invoking bdrv_block_status() (via 'qemu-img map', for example).

here, it will set BDRV_BLOCK_ZERO, but will not set BDRV_BLOCK_ALLOCATED, which contradicts with it's definition:

  BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
                        layer (short for DATA || ZERO), set by block layer

This text is wrong; it gets fixed in my still-pending concluding series for byte-based block status:

https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00955.html

Conceptually, BDRV_BLOCK_ALLOCATED means "is THIS layer of the backing chain responsible for the contents at this guest offset"; and there are cases where we know that we read zeroes but where the current layer is not responsible for the contents (such as a qcow2 that has a backing file with shorter length, where we return BDRV_BLOCK_ZERO but not BDRV_BLOCK_ALLOCATED). But since NBD has no backing chain, the entire image is considered allocated. Meanwhile, asking whether something is allocated ('qemu-io -c map') is not the usual question you want to ask when determining what portions of a file are zero.



for nbd, we go through the similar way on server (but with want_zero = true), and we finally have BDRV_BLOCK_ZERO without BDRV_BLOCK_ALLOCATED, which maps to NBD_STATE_HOLE+NBD_STATE_ZERO. But then, in the client we have BDRV_BLOCK_ZERO not automatically added by block layer but directly from nbd driver, therefor BDRV_BLOCK_ALLOCATED is set and I get different result.

Drivers should never set BDRV_BLOCK_ALLOCATED; only the code in io.c should set it; and output based on BDRV_BLOCK_ALLOCATED is only useful in backing chain scenarios (which NBD does not have).



this all looks weird for me.

BDRV_BLOCK_ALLOCATED definition should be fixed, to show that this flag show only reported by driver flags, not automatically added.

If we need to report yet more flags, where the driver can report additional information, then that's different. But changing the BDRV_BLOCK_ALLOCATED semantics would probably have knock-on effects that I'm not prepared to audit for (that is, we'd rather fix the documentation to match reality, which my pending patch does, and NOT change the code to match the current incorrect documentation).


And then the situation with nbd looks strange, because automatically added flag becomes "flag, reported by block driver".

On the other hand, it is not wrong.


I think, to test the feature, I'll improve qemu-io map, to show zeros (or is it better to add separate command?) and leave the other things as is. What do you think?


If you want to add a new qemu-io command that matches more what 'qemu-img map' does, be my guest. In fact, renaming the existing 'qemu-io map' to something else may be warranted (qemu-io has a much smaller user base, and is not documented as being stable, so back-compat is not an issue, as long as iotests still pass). But I do know that existing iotests depend on the current 'qemu-io map' semantics of being a thin veneer around bdrv_is_allocated(), and that we can't change those semantics. So if we do anything, adding a new command may be smarter.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]