Am 24.02.2018 um 00:38 hat Eric Blake geschrieben:
On 02/23/2018 11:05 AM, Kevin Wolf wrote:
Am 23.02.2018 um 17:43 hat Eric Blake geschrieben:
OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.
So OFFSET_VALID | DATA for a protocol BDS is not just convenient, but
necessary to avoid breaking qemu-img map output. But you are also right
that OFFSET_VALID without data makes little sense at a protocol layer. So
with that in mind, I'm auditing all of the protocol layers to make sure
OFFSET_VALID ends up as something sane.
That's one way to look at it.
The other way is that qemu-img map shouldn't ask the protocol layer for
its offset because it already knows the offset (it is what it passes as
a parameter to bdrv_co_block_status).
Anyway, it's probably not worth changing the interface, we should just
make sure that the return values of the individual drivers are
consistent.
Yet another inconsistency, and it's making me scratch my head today.
By the way, in my byte-based stuff that is now pending on your tree, I tried
hard to NOT change semantics or the set of flags returned by a given driver,
and we agreed that's why you'd accept the series as-is and make me do this
followup exercise. But it's looking like my followups may end up touching a
lot of the same drivers again, now that I'm looking at what the semantics
SHOULD be (and whatever I do end up tweaking, I will at least make sure that
iotests is still happy with it).
Hm, that's unfortunate, but I don't think we should hold up your first
series just so we can touch the drivers only once.
First, let's read what states the NBD spec is proposing:
It defines the following flags for the flags field:
NBD_STATE_HOLE (bit 0): if set, the block represents a hole (and future
writes to that area may cause fragmentation or encounter an ENOSPC error); if
clear, the block is allocated or the server could not otherwise determine its
status. Note that the use of NBD_CMD_TRIM is related to this status, but that
the server MAY report a hole even where NBD_CMD_TRIM has not been requested,
and also that a server MAY report that the block is allocated even where
NBD_CMD_TRIM has been requested.
NBD_STATE_ZERO (bit 1): if set, the block contents read as all zeroes; if
clear, the block contents are not known. Note that the use of
NBD_CMD_WRITE_ZEROES is related to this status, but that the server MAY report
zeroes even where NBD_CMD_WRITE_ZEROES has not been requested, and also that a
server MAY report unknown content even where NBD_CMD_WRITE_ZEROES has been
requested.
It is not an error for a server to report that a region of the export has both
NBD_STATE_HOLE set and NBD_STATE_ZERO clear. The contents of such an area are
undefined, and a client reading such an area should make no assumption as to
its contents or stability.
So here's how Vladimir proposed implementing it in his series (written
before my byte-based block status stuff went in to your tree):
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04038.html
Server side (3/9):
+ int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes,
&num,
+ NULL, NULL);
+ if (ret < 0) {
+ return ret;
+ }
+
+ flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+ (ret & BDRV_BLOCK_ZERO ? NBD_STATE_ZERO : 0);
Client side (6/9):
+ *pnum = extent.length >> BDRV_SECTOR_BITS;
+ return (extent.flags & NBD_STATE_HOLE ? 0 : BDRV_BLOCK_DATA) |
+ (extent.flags & NBD_STATE_ZERO ? BDRV_BLOCK_ZERO : 0);
Does anything there strike you as odd?
Two things I noticed while reading the above:
1. NBD doesn't consider backing files, so the definition of holes
becomes ambiguous. Is a hole any block that isn't allocated in the
top layer (may cause fragmentation or encounter an ENOSPC error) or
is it any block that isn't allocated anywhere in the whole backing
chain (may read as non-zero)?
Considering that there is a separate NBD_STATE_ZERO and nothing
forbids a state of NBD_STATE_HOLE without NBD_STATE_ZERO, maybe the
former is more useful. The code you quote implements the latter.
Maybe if we go with the former, we should add a note to the NBD spec
that explictly says that NBD_STATE_HOLE doesn't imply any specific
content that is returned on reads.
2. Using BDRV_BLOCK_ALLOCATED to determine NBD_STATE_HOLE seems wrong. A
(not preallocated) zero cluster in qcow2 returns BDRV_BLOCK_ALLOCATED
(because we don't fall through to the backing file) even though I
think it's a hole. BDRV_BLOCK_DATA should be used there (which makes
it consistent with the other direction).
In isolation, they seemed fine to
me, but side-by-side, I'm scratching my head: the server queries the block
layer, and turns BDRV_BLOCK_ALLOCATED into !NBD_STATE_HOLE; the client side
then takes the NBD protocol and tries to turn it back into information to
feed the block layer, where !NBD_STATE_HOLE now feeds BDRV_BLOCK_DATA. Why
the different choice of bits?
Which is actually consistent in the end, becaue BDRV_BLOCK_DATA implies
BDRV_BLOCK_ALLOCATED.
Essentially, assuming a simple backing chain 'base <- overlay', we got
these combinations to represent in NBD (with my suggestion of the flags
to use):
1. Cluster allocated in overlay
a. non-zero data 0
b. explicit zeroes 0 or ZERO
2. Cluster marked zero in overlay HOLE | ZERO
3. Cluster preallocated/zero in overlay ZERO
4. Cluster unallocated in overlay
a. Cluster allocated in base (non-zero) HOLE
b. Cluster allocated in base (zero) HOLE or HOLE | ZERO
c. Cluster marked zero in base HOLE | ZERO
d. Cluster preallocated/zero in base HOLE | ZERO
e. Cluster unallocated in base HOLE | ZERO
Instead of 'base' you can read 'anywhere in the backing chain' and the
flags should stay the same.