[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on s
From: |
Alex Bligh |
Subject: |
Re: [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server |
Date: |
Mon, 25 Apr 2016 13:16:35 +0100 |
Eric,
See my message on nbd-general today re the necessity (or not)
of getting NBD_OPT_BLOCK_SIZE first; it may be just that you
can assume 512 is OK.
Otherwise
Reviewed-by: Alex Bligh <address@hidden>
Alex
On 23 Apr 2016, at 00:40, Eric Blake <address@hidden> wrote:
> The upstream NBD Protocol has defined a new extension to allow
> the server to advertise block sizes to the client, as well as
> a way for the client to inform the server that it intends to
> obey block sizes.
>
> Thanks to a recent fix, our minimum transfer size is always
> 1 (the block layer takes care of read-modify-write on our
> behalf), although if we wanted down the road, we could
> advertise a minimum of 512 based on our usage patterns to a
> client that is willing to honor block sizes. Meanwhile,
> advertising our absolute maximum transfer size of 32M will
> help newer clients avoid EINVAL failures.
>
> We do not reject clients for using the older NBD_OPT_EXPORT_NAME;
> we are no worse off for those clients than we used to be. But
> for clients new enough to use NBD_OPT_GO, we require the client
> to first send us NBD_OPT_BLOCK_SIZE to prove they know about
> sizing restraints, by failing with NBD_REP_ERR_BLOCK_SIZE_REQD.
> All existing released qemu clients (whether old-style or new, at
> least by the end of this series) already honor our limits, and
> will still connect; so at most, this would reject a new non-qemu
> client that doesn't intend to obey limits (and that client could
> still use NBD_OPT_EXPORT_NAME to bypass our rejection).
>
> Signed-off-by: Eric Blake <address@hidden>
> ---
> include/block/nbd.h | 2 ++
> nbd/nbd-internal.h | 1 +
> nbd/server.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 65 insertions(+)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 1072d9e..a5c68df 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -96,11 +96,13 @@ typedef struct nbd_reply nbd_reply;
> #define NBD_REP_ERR_TLS_REQD NBD_REP_ERR(5) /* TLS required */
> #define NBD_REP_ERR_UNKNOWN NBD_REP_ERR(6) /* Export unknown */
> #define NBD_REP_ERR_SHUTDOWN NBD_REP_ERR(7) /* Server shutting down */
> +#define NBD_REP_ERR_BLOCK_SIZE_REQD NBD_REP_ERR(8) /* Missing OPT_BLOCK_SIZE
> */
>
> /* Info types, used during NBD_REP_INFO */
> #define NBD_INFO_EXPORT 0
> #define NBD_INFO_NAME 1
> #define NBD_INFO_DESCRIPTION 2
> +#define NBD_INFO_BLOCK_SIZE 3
>
> /* Request flags, sent from client to server during transmission phase */
> #define NBD_CMD_FLAG_FUA (1 << 0) /* 'force unit access' during write
> */
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index 1935102..1354182 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -85,6 +85,7 @@
> #define NBD_OPT_STARTTLS (5)
> #define NBD_OPT_INFO (6)
> #define NBD_OPT_GO (7)
> +#define NBD_OPT_BLOCK_SIZE (9)
>
> /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
> * but only a limited set of errno values is specified in the protocol.
> diff --git a/nbd/server.c b/nbd/server.c
> index 563afb2..86d1e2d 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -83,6 +83,7 @@ struct NBDClient {
> void (*close)(NBDClient *client);
>
> bool no_zeroes;
> + bool block_size;
> NBDExport *exp;
> QCryptoTLSCreds *tlscreds;
> char *tlsaclname;
> @@ -346,6 +347,7 @@ static int nbd_negotiate_handle_info(NBDClient *client,
> uint32_t length,
> uint16_t type;
> uint64_t size;
> uint16_t flags;
> + uint32_t block;
>
> /* Client sends:
> [20 .. xx] export name (length bytes)
> @@ -391,6 +393,57 @@ static int nbd_negotiate_handle_info(NBDClient *client,
> uint32_t length,
> }
>
> rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
> + sizeof(type) + 3 * sizeof(block));
> + if (rc < 0) {
> + return rc;
> + }
> +
> + type = cpu_to_be16(NBD_INFO_BLOCK_SIZE);
> + if (nbd_negotiate_write(client->ioc, &type, sizeof(type)) !=
> + sizeof(type)) {
> + LOG("write failed");
> + return -EIO;
> + }
> + /* minimum - Always 1, because we use blk_pread().
> + * TODO: Advertise 512 if guest used NBD_OPT_BLOCK_SIZE? */
> + block = cpu_to_be32(1);
> + if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
> + sizeof(block)) {
> + LOG("write failed");
> + return -EIO;
> + }
> + /* preferred - At least 4096, but larger as appropriate. */
> + block = blk_get_opt_transfer_length(exp->blk) * BDRV_SECTOR_SIZE;
> + block = cpu_to_be32(MAX(4096, block));
> + if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
> + sizeof(block)) {
> + LOG("write failed");
> + return -EIO;
> + }
> + /* maximum - At most 32M, but smaller as appropriate. */
> + block = blk_get_max_transfer_length(exp->blk);
> + if (block && block < NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE) {
> + block *= BDRV_SECTOR_SIZE;
> + } else {
> + block = NBD_MAX_BUFFER_SIZE;
> + }
> + block = cpu_to_be32(block);
> + if (nbd_negotiate_write(client->ioc, &block, sizeof(block)) !=
> + sizeof(block)) {
> + LOG("write failed");
> + return -EIO;
> + }
> +
> + if (!client->block_size) {
> + /* The client is new enough to use NBD_OPT_GO, but forgot to
> + * tell us that it plans to obey block sizes. Since we fail
> + * hard on oversize requests, it's better to reject such a
> + * client up front. */
> + return nbd_negotiate_send_rep(client->ioc,
> NBD_REP_ERR_BLOCK_SIZE_REQD,
> + opt);
> + }
> +
> + rc = nbd_negotiate_send_rep_len(client->ioc, NBD_REP_INFO, opt,
> sizeof(type) + sizeof(size) +
> sizeof(flags));
> if (rc < 0) {
> @@ -630,6 +683,15 @@ static int nbd_negotiate_options(NBDClient *client,
> uint16_t myflags)
> }
> break;
>
> + case NBD_OPT_BLOCK_SIZE:
> + client->block_size = true;
> + ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
> + clientflags);
> + if (ret < 0) {
> + return ret;
> + }
> + break;
> +
> case NBD_OPT_STARTTLS:
> if (nbd_negotiate_drop_sync(client->ioc, length) != length) {
> return -EIO;
> --
> 2.5.5
>
>
--
Alex Bligh
- [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields, (continued)
- [Qemu-devel] [PATCH v3 30/44] nbd: Treat flags vs. command type as separate fields, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 38/44] block: Add blk_get_opt_transfer_length(), Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 41/44] nbd: Implement NBD_CMD_WRITE_ZEROES on server, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 42/44] nbd: Implement NBD_CMD_WRITE_ZEROES on client, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 37/44] nbd: Create struct for tracking export info, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server, Eric Blake, 2016/04/22
- Re: [Qemu-devel] [PATCH v3 43/44] nbd: Implement NBD_OPT_BLOCK_SIZE on server,
Alex Bligh <=
- [Qemu-devel] [PATCH v3 19/44] qemu-io: Switch to byte-based block access, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 24/44] qemu-io: Add 'write -f' to test FUA flag, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 25/44] qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 27/44] nbd: Use BDRV_REQ_FUA for better FUA where supported, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 26/44] qemu-io: Add 'write -z -u' to test MAY_UNMAP flag, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 34/44] nbd: Less allocation during NBD_OPT_LIST, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 35/44] nbd: Support shorter handshake, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 32/44] nbd: Share common option-sending code in client, Eric Blake, 2016/04/22
- [Qemu-devel] [PATCH v3 36/44] nbd: Improve handling of shutdown requests, Eric Blake, 2016/04/22