qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCHv2] Strawman proposal for NBD structured replies


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCHv2] Strawman proposal for NBD structured replies
Date: Tue, 29 Mar 2016 17:17:52 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1

On 03/29/2016 04:48 PM, Alex Bligh wrote:
> Here's a strawman for the structured reply section. I haven't
> covered negotation.
> 
> Also at:
>   https://github.com/abligh/nbd/tree/strawman-structured-reply
> Markdown at:
>   https://github.com/abligh/nbd/blob/strawman-structured-reply/doc/proto.md
> 

> +++ b/doc/proto.md
> @@ -195,15 +195,145 @@ C: 64 bits, offset (unsigned)
>  C: 32 bits, length (unsigned)  
>  C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)
>  
> -The server replies with:
> +Replies take one of two forms. They may either be structured replies,

Hmm, you put your strawman directly in the 'transmission phase' section,
while mine is deferred to the 'Experimental Extensions' section, at
least until we have a reference client and server implementation.

> +or unstructured replies. The server MUST NOT send structured replies
> +unless it has negotiated structured replies with the client using
> +`NBD_OPT_STUCTURED_REPLIES` (??).
> +
> +[Option #1: Subject to that, the server may choose whether it sends
> +any given reply to any given command as a structured reply or an
> +unstructured reply.]

Existing clients are not expecting a structured reply, so at least
SOMETHING must be negotiated before allowing any structured reply.  But
seems a bit awkward to force a client to recognize two different
possible replies to any given command that does not carry data.

> +
> +[Option #2: If this option is negotiated, the server MUST send all
> +replies as structured replies. If the option is not negotiated, the
> +server MUST send all replies as unstructured replies.]

Doable, but slightly more traffic on the wire.  In my v2, that would
mean most other commands reply with NBD_REPLY_TYPE_NONE on success, or
NBD_REPLY_TYPE_ERROR on failure.

> +
> +[Option #3: If this option is negotiated, the server MUST send all
> +replies to command that support structured replies as structured
> +replies (currently `NBD_CMD_READ` only), and all other replies as
> +unstructured replies. If the option is not negotiated, the server MUST
> +send all replies as unstructured replies.]

My v2 went with this approach, for all existing commands, but documented
that future commands (such as the proposed NBD_CMD_GET_LBA_STATUS, or
whatever we name it) may be structured-only.

I also gave the server latitude to reject clients that negotiate a
future structured-reply command without also negotiating structured
reads, which I guess would help if we want to go with option #2 instead
of #3.

> +#### Structured replies
> +
> +A structured reply consists of one or more chunks. The server
> +MUST send exactly one end chunk (identified by
> +the chunk type `NBD_CHUNKTYPE_END`), and this MUST be the final
> +chunk within the reply.

My v2 went with a flag marking the end packet, rather than a specific
type.  I actually ended up with 5 defined reply types, where any of them
could be flagged as the end packet (but where some of them are useful
only as the end packet).

> +
> +Each chunk consists of the following:
> +
> +S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  

I see you cribbed the number in my original proposal; I got my number
from bash's $RANDOM; I don't know if you have any underlying scheme for
other magic numbers in use (such as consecutive HEX digits from pi, or
something with ASCII meaning, or something in l33t-speak), so I'm fine
if anyone has rationale for why a better magic number would be desirable.

> +S: 32 bits, flags (including type)  
> +S: 64 bits, handle  
> +S: 32 bits, payload length  
> +S: (*length* bytes of payload data)  
> +
> +The flags have the following meanings:
> +
> +* bits 0-7: `NBD_CHUNKTYPE`, an 8 bit unsigned integer
> +* bits 8-31: reserved (server MUST set these to zero)

My v2 separates the flag and type fields.

> +
> +Possible values of `NBD_CHUNKTYPE` are as follows:
> +
> +* 0 = `NBD_CHUNKTYPE_END`: the final chunk
> +* 1 = `NBD_CHUNKTYPE_DATA`: data that has been read
> +* 2 = `NBD_CHUNKTYPE_ZERO`: data that should be considered zero
> +
> +The format of the payload data for each chunk type is as follows:
> +
> +##### `NBD_CHUNKTYPE_END`
> +
> +S: 32 bits, error code or zero for success  
> +S: 64 bits, offset of error (if any)  
> +
> +##### `NBD_CHUNKTYPE_DATA`
> +
> +S: 64 bits, offset of data  
> +S: (*length-8* bytes of data as read)  
> +
> +##### `NBD_CHUNKTYPE_ZERO`
> +
> +S: 64 bits, offset of data  
> +S: 32 bits, number of zeroes to which this corresponds  

Structure wise, our reply types look similar, even though the naming is
different.  I also had two more types (no data, used for success, and a
distinct error without offset type, rather than special-casing -1 as
indeterminate offset).


>  
> -Replies need not be sent in the same order as requests (i.e., requests
> -may be handled by the server asynchronously).
> +The server MUST NOT send chunks that overlap. The server
> +MUST NOT send chunks whose data exceeds the length
> +of data requested (for this purpose counting the data
> +within `NBD_CHUNKTYPE_ZERO` as the number of zero bytes
> +specified therein). The server MUST, in the case of a successesful

successful

> +read send exactly the number of bytes requested (whether
> +represented by `NBD_CHUNKTYPE_DATA` or `NBD_CHUNKTYPE_ZERO`).
> +The server MUST NOT, in the case of an errored read, send
> +more than the number of bytes requested.
> +
> +In order to avoid the burden of reassembly, the client
> +MAY send `NBD_CMD_FLAG_DF`, which instructs the server
> +not to fragment the reply. If this flag is set, the server
> +MUST send either zero or one data chunks and an `NBD_CHUNKTYPE_END`
> +only. Under such circumstances the server MAY error the command
> +with `ETOOBIG` if the length read exceeds [65,536 bytes | the

`ETOOBIG` is not a standard error; my v2 went with the POSIX EOVERFLOW
and defined it to it's Linux value of 75.

> +
> +If more than one data chunk containing an error has been transmitted
> +prior to sending the `NBD_CHUNKTYPE_END`, the server MUST take
> +the second option above, to avoid the client assuming that data
> +chunks which do not contain the offset marked as errored are
> +error-free.

My v2 lets the server send multiple error-with-offset packets in the
reply chain.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]