qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/2] NBD proto: add WRITE_ZEROES extension


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH 1/2] NBD proto: add WRITE_ZEROES extension
Date: Wed, 23 Mar 2016 09:14:10 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

On 03/23/2016 08:16 AM, Denis V. Lunev wrote:
> From: Pavel Borzenkov <address@hidden>
> 
> There exist some cases when a client knows that the data it is going to
> write is all zeroes. Such cases include mirroring or backing up a device
> implemented by a sparse file.
> 
> With current NBD command set, the client has to issue NBD_CMD_WRITE
> command with zeroed payload and transfer these zero bytes through the
> wire. The server has to write the data onto disk, effectively denying
> the sparseness.
> 
> To remedy this, the patch adds WRITE_ZEROES extension with one new
> NBD_CMD_WRITE_ZEROES command.
> 
> Signed-off-by: Pavel Borzenkov <address@hidden>
> Reviewed-by: Roman Kagan <address@hidden>
> Signed-off-by: Denis V. Lunev <address@hidden>
> CC: Wouter Verhelst <address@hidden>
> CC: Paolo Bonzini <address@hidden>
> CC: Kevin Wolf <address@hidden>
> CC: Stefan Hajnoczi <address@hidden>
> ---
>  doc/proto.md | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 

>  
> +* `NBD_CMD_WRITE_ZEROES` (6)
> +
> +    Defined by the experimental `WRITE_ZEROES` extension; see below.

If this patch goes in to the NBD sources, the extension is not
experimental any more, right?  Shouldn't the patch be written under the
assumption that it will be the final form of the text, even though the
feature is experimental until then? [1]

> +### `WRITE_ZEROES` extension
> +
> +There exist some cases when a client knows that the data it is going to write
> +is all zeroes. Such cases include mirroring or backing up a device 
> implemented
> +by a sparse file. With current NBD command set, the client has to issue
> +`NBD_CMD_WRITE` command with zeroed payload and transfer these zero bytes
> +through the wire. The server has to write the data onto disk, effectively
> +denying the sparseness.

s/denying/losing/ ?

> +
> +To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds

s/is envisioned/exists/ - again, under the argument that once it is part
of this document, it is no longer experimental.

/me goes and reads the actual proto.md file, and light bulb turns on...

[1] Oh, you ARE adding this to the "Experimental extensions" section of
the document, so your wording IS correct.  I guess the idea is that we
write up the documentation in the experimental section, tweak qemu to
implement it both as NBD client and as NBD server (since qemu has code
that can serve in both positions), see how well it worked, and THEN do a
followup patch to proto.md to move the text into the non-experimental
section, along with any tweaks learned during the qemu patches.

> +one new command with two command flags.
> +
> +* `NBD_CMD_WRITE_ZEROES` (6)
> +
> +    A write request with no payload. Length and offset define the location
> +    and amount of data to be zeroed.
> +
> +    The server MUST zero out the data on disk, and then send the reply
> +    message. The server MAY send the reply message before the data has
> +    reached permanent storage.
> +
> +    If the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") was set in the
> +    export flags field, the client MAY set the flag `NBD_CMD_FLAG_FUA` (bit 
> 0)
> +    in the command flags field. If this flag was set, the server MUST NOT 
> send
> +    the reply until it has ensured that the newly-zeroed data has reached
> +    permanent storage.

Do we want to add:

The server SHOULD return EINVAL if the client set NBD_CMD_FLAG_FUA but
the export flags did not include NBD_FLAG_SEND_FUA.

> +
> +    If the flag `NBD_CMD_FLAG_MAY_TRIM` (bit 1) was set by the client in the
> +    command flags field, the server MAY use trimming to zero out the area,
> +    but it MUST ensure that the data reads back as zero.

Bug in the existing spec: The constant NBD_CMD_FLAG_FUA is mentioned,
but never defined with a fixed value.  Your text above defines it to
'bit 0' in the command flags field - is that correct?  If so, should we
add a section to the document that lists the bit values of all supported
command flags?

Meanwhile, your proposed text hardcodes NBD_CMD_FLAG_MAY_TRIM to 'bit
1'; that might also be worth adding into the same new section of the
document documenting all supported command flags.

Do we want to require that the server has negotiated the
NBD_FLAG_SEND_TRIM export flag prior to allowing the
NBD_CMD_FLAG_MAY_TRIM flag in this command?

Possibly-related bug in the existing spec: Should the text of the
NBD_CMD_TRIM (4) command mention the desired server behavior if the
client sends NBD_CMD_TRIM even though NBD_FLAG_SEND_TRIM was not
negotiated in the export flags?  Similarly, should the text of the
NBD_CMD_FLUSH () command mention the desired server behavior if the
client sends NBD_CMD_FLUSH even though NBD_FLAG_SEND_FLUSH was not
negotiated in the export flags?  At least NBD_CMD_FLUSH recommended that
the client must not send the command if the feature was not negotiated.

> +
> +    If an error occurs, the server SHOULD set the appropriate error code
> +    in the error field. The server MAY then close the connection.
> +
> +The server SHOULD return `ENOSPC` if it receives a write zeroes request
> +including one or more sectors beyond the size of the device. It SHOULD
> +return `EPERM` if it receives a write zeroes request on a read-only export.

Should we add a paragraph stating that the client MUST NOT send
NBD_CMD_WRITE_ZEROES if NBD_FLAG_SEND_WRITE_ZEROES was not negotiated in
the export options?  Similarly, should we suggest that the server reply
with EINVAL if it knows about the command, but the client issues the
command in spite of not negotiating it?  Should we enhance the
documentation in the "Error values" heading to mention that EINVAL
should be used in general for any client command not expected by the server?

> +
>  ## About this file
>  
>  This file tries to document the NBD protocol as it is currently
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]