[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect |
Date: |
Fri, 7 Jun 2019 11:44:15 +0000 |
07.06.2019 6:17, Eric Blake wrote:
> On 4/11/19 12:27 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Implement reconnect. To achieve this:
>>
>> 1. add new modes:
>> connecting-wait: means, that reconnecting is in progress, and there
>> were small number of reconnect attempts, so all requests are
>> waiting for the connection.
>> connecting-nowait: reconnecting is in progress, there were a lot of
>> attempts of reconnect, all requests will return errors.
>>
>> two old modes are used too:
>> connected: normal state
>> quit: exiting after fatal error or on close
>>
>> Possible transitions are:
>>
>> * -> quit
>> connecting-* -> connected
>> connecting-wait -> connecting-nowait (transition is done after
>> reconnect-delay seconds in connecting-wait mode)
>> connected -> connecting-wait
>>
>> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>> connection_co, tries to reconnect unlimited times.
>>
>> 3. Retry nbd queries on channel error, if we are in connecting-wait
>> state.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>> ---
>
> Does this also mean that we can start queuing up guest I/O even before
> the first time connected is reached?
No, we decided that it's simpler and clearer way to keep first connect to
be synchronous in nbd_open.
>
>> block/nbd-client.h | 7 +
>> block/nbd-client.c | 336 +++++++++++++++++++++++++++++++++++----------
>> 2 files changed, 273 insertions(+), 70 deletions(-)
>>
>
>> +++ b/block/nbd-client.c
>> @@ -1,6 +1,7 @@
>> /*
>> * QEMU Block driver for NBD
>> *
>> + * Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
>
> Adding copyright is fine - you are indeed adding a non-trivial amount of
> code to this file. Adding "All rights reserved" is questionable, in part
> because it no longer has legal status (see this recent nbdkit patch
> https://github.com/libguestfs/nbdkit/commit/952ffe0fc7 for example).
>
> Furthermore, I really cringe when I see it mixed with GPL, because the
> GPL works by explicitly stating that you are NOT reserving all rights,
> but are rather granting specific permissions to recipients. However, as
> this file is BSD licensed, and the various family tree of BSD licenses
> have (often due to copy-and-paste) used this phrase in the past, I'm not
> going to reject the patch because of the phrase, even though I can
> definitely ask if you can remove it.
Hmm, I think it's not a problem to drop "All rights reserved".
>
>> @@ -59,24 +77,133 @@ static void nbd_teardown_connection(BlockDriverState
>> *bs)
>> {
>> NBDClientSession *client = nbd_get_client_session(bs);
>>
>> - assert(client->ioc);
>> -
>> - /* finish any pending coroutines */
>> - qio_channel_shutdown(client->ioc,
>> - QIO_CHANNEL_SHUTDOWN_BOTH,
>> - NULL);
>> + if (client->state == NBD_CLIENT_CONNECTED) {
>> + /* finish any pending coroutines */
>> + assert(client->ioc);
>> + qio_channel_shutdown(client->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
>> + }
>> + client->state = NBD_CLIENT_QUIT;
>> + if (client->connection_co) {
>> + qemu_co_sleep_wake(client->connection_co);
>> + }
>> BDRV_POLL_WHILE(bs, client->connection_co);
>
> So you are using the qemu_co_sleep_wake code as a way to in effect
> cancel any ongoing sleep. I'm still not sure if there is already another
> way to achieve the same effect, perhaps by re-entering the coroutine?
It marked as scheduled by qemu_co_sleep_ns and can't be entered, due to this
code
in qemu_aio_coroutine_enter:
128 if (scheduled) {
129 fprintf(stderr,
130 "%s: Co-routine was already scheduled in '%s'\n",
131 __func__, scheduled);
132 abort();
133 }
>
>> +typedef struct NBDConnection {
>> + BlockDriverState *bs;
>> + SocketAddress *saddr;
>> + const char *export;
>> + QCryptoTLSCreds *tlscreds;
>> + const char *hostname;
>> + const char *x_dirty_bitmap;
>> +} NBDConnection;
>
> Can we put this type in a header, and use it instead of passing lots of
> individual parameters to nbd_client_connect()? Probably as a separate
> pre-requisite cleanup patch.
Will try
>
>> +
>> +static bool nbd_client_connecting(NBDClientSession *client)
>> +{
>> + return client->state == NBD_CLIENT_CONNECTING_WAIT ||
>> + client->state == NBD_CLIENT_CONNECTING_NOWAIT;
>> +}
>
> I don't know what style we prefer to use here. If my returns split
> across lines, I tend to go with either 4-space indent instead of 7, or
> to use () so that the second line is indented to the column after (; but
> this is aesthetics and so I'm not going to change what you have.
Not a problem for me to change, if you dislike)
>
>> +
>> +static bool nbd_client_connecting_wait(NBDClientSession *client)
>> +{
>> + return client->state == NBD_CLIENT_CONNECTING_WAIT;
>> +}
>> +
>> +static coroutine_fn void nbd_reconnect_attempt(NBDConnection *con)
>> +{
>> + NBDClientSession *s = nbd_get_client_session(con->bs);
>> + Error *local_err = NULL;
>> +
>> + if (!nbd_client_connecting(s)) {
>> + return;
>> + }
>> + assert(nbd_client_connecting(s));
>> +
>> + /* Wait completion of all in-flight requests */
>
> Wait for completion
>
>> +
>> + qemu_co_mutex_lock(&s->send_mutex);
>>
>> - nbd_client_detach_aio_context(bs);
>> - object_unref(OBJECT(client->sioc));
>> - client->sioc = NULL;
>> - object_unref(OBJECT(client->ioc));
>> - client->ioc = NULL;
>> + while (s->in_flight > 0) {
>> + qemu_co_mutex_unlock(&s->send_mutex);
>> + nbd_recv_coroutines_wake_all(s);
>> + s->wait_in_flight = true;
>> + qemu_coroutine_yield();
>> + s->wait_in_flight = false;
>> + qemu_co_mutex_lock(&s->send_mutex);
>> + }
>> +
>> + qemu_co_mutex_unlock(&s->send_mutex);
>> +
>> + if (!nbd_client_connecting(s)) {
>> + return;
>> + }
>> +
>> + /*
>> + * Now we are sure, that nobody accessing the channel now and nobody
>> + * will try to access the channel, until we set state to CONNECTED
>
> Now we are sure that nobody is accessing the channel, and no one will
> try until we set the state to CONNECTED
>
>> + */
>> +
>> + /* Finalize previous connection if any */
>> + if (s->ioc) {
>> + nbd_client_detach_aio_context(con->bs);
>> + object_unref(OBJECT(s->sioc));
>> + s->sioc = NULL;
>> + object_unref(OBJECT(s->ioc));
>> + s->ioc = NULL;
>> + }
>> +
>> + s->connect_status = nbd_client_connect(con->bs, con->saddr,
>> + con->export, con->tlscreds,
>> + con->hostname,
>> con->x_dirty_bitmap,
>> + &local_err);
>> + error_free(s->connect_err);
>> + s->connect_err = NULL;
>> + error_propagate(&s->connect_err, local_err);
>> + local_err = NULL;
>> +
>> + if (s->connect_status < 0) {
>> + /* failed attempt */
>> + return;
>> + }
>> +
>> + /* successfully connected */
>> + s->state = NBD_CLIENT_CONNECTED;
>> + qemu_co_queue_restart_all(&s->free_sema);
>> +}
>> +
>> +static coroutine_fn void nbd_reconnect_loop(NBDConnection *con)
>> +{
>> + NBDClientSession *s = nbd_get_client_session(con->bs);
>> + uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> + uint64_t delay_ns = s->reconnect_delay * 1000000000UL;
>
> Do we have a #define constant for nanoseconds in a second to make this
> more legible than counting '0's?
>
>> + uint64_t timeout = 1000000000UL; /* 1 sec */
>> + uint64_t max_timeout = 16000000000UL; /* 16 sec */
>
> 1 * constant, 16 * constant
OK, if we have not one, I can at least add it locally
>
>> +
>> + nbd_reconnect_attempt(con);
>> +
>> + while (nbd_client_connecting(s)) {
>> + if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
>> + qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns >
>> delay_ns)
>> + {
>> + s->state = NBD_CLIENT_CONNECTING_NOWAIT;
>> + qemu_co_queue_restart_all(&s->free_sema);
>> + }
>> +
>> + bdrv_dec_in_flight(con->bs);
>> + qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, timeout);
>
> Another place where I'd like someone more familiar with coroutines to
> also have a look.
>
>> + bdrv_inc_in_flight(con->bs);
>> + if (timeout < max_timeout) {
>> + timeout *= 2;
>> + }
>> +
>> + nbd_reconnect_attempt(con);
>> + }
>> }
>>
>> static coroutine_fn void nbd_connection_entry(void *opaque)
>> {
>> - NBDClientSession *s = opaque;
>> + NBDConnection *con = opaque;
>> + NBDClientSession *s = nbd_get_client_session(con->bs);
>> uint64_t i;
>> int ret = 0;
>> Error *local_err = NULL;
>> @@ -91,16 +218,25 @@ static coroutine_fn void nbd_connection_entry(void
>> *opaque)
>> * Therefore we keep an additional in_flight reference all the
>> time and
>> * only drop it temporarily here.
>> */
>> +
>> + if (nbd_client_connecting(s)) {
>> + nbd_reconnect_loop(con);
>> + }
>> +
>> + if (s->state != NBD_CLIENT_CONNECTED) {
>> + continue;
>> + }
>> +
>
> If I understand, this is not a busy loop because nbd_reconnect_loop()
> will pause the coroutine until something interesting happens. Correct?
At least it will do qemu_co_sleep_ns.
>
> It's late enough at night for me that I don't trust this to be a full
> review; I'll try and look again soon in more details, as well as play
> with this against iotests.
>
Thank you!
--
Best regards,
Vladimir
Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect,
Vladimir Sementsov-Ogievskiy <=
Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect, Vladimir Sementsov-Ogievskiy, 2019/06/10