qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] NBD block device backend - 'improvements'


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] NBD block device backend - 'improvements'
Date: Mon, 14 Feb 2011 20:32:24 +0000

On Mon, Feb 14, 2011 at 7:40 PM, Nicholas Thomas <address@hidden> wrote:
> I've written a patch that changes the behaviour - instead of exiting at
> startup, we wait for the NBD connection to be established, and we hang
> on reads and writes until the connection is re-established.

Hi Nick,
I think reconnect is a useful feature.  For more info on submitting
patches to QEMU, please see
http://wiki.qemu.org/Contribute/SubmitAPatch.  It contains a few
points like sending patches inline instead of as an attachment (makes
review easier), using Signed-off-by:, and references to QEMU coding
style.

> I'm interested in getting the changes merged upstream, so I thought I'd
> get in early and ask if you'd be interested in the patch, in principle;
> whether the old behaviour would need to be preserved, making the new
> behaviour accessible via a config option ("-drive
> file=nbd:127.0.0.1:5000:retry=forever,..." ?); and whether I'm going
> about the changes in a sane way (I've attached the current version of
> the patch).

block/nbd.c needs to be made asynchronous in order for this change to
work.  Otherwise the only thing you can do is to block QEMU and the
VM, and even that shouldn't be done using sleep(2) because that could
stop the I/O thread which processes the QEMU monitor and other
external interfaces.  See below for more info.

> Another thing I've noticed is that the nbd library ( /nbd.c ) is not
> IPv6-compatible ( "-drive file=nbd:\:\:1:5000", for instance ) - I don't
> have a patch for that yet, but I'm going to need to write one :) -
> presumably you'd like that merging upstream too (and I should make the
> library use the functions provided in qemu_socket.h) ?

IPv6 would be nice and if you can consolidate that in qemu_socket.h,
then that's a win for non-nbd socket users too.

I have inlined your patch below for easy reviewing:

> diff --git a/block/nbd.c b/block/nbd.c
> index c8dc763..87da07e 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -39,9 +39,11 @@ typedef struct BDRVNBDState {
>      int sock;
>      off_t size;
>      size_t blocksize;
> +    char *filename;

block_int.h:BlockDriverState->filename already has this.

>  } BDRVNBDState;
>
> -static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
> +
> +static int nbd_open(BlockDriverState *bs)
>  {
>      BDRVNBDState *s = bs->opaque;
>      uint32_t nbdflags;
> @@ -56,7 +58,7 @@ static int nbd_open(BlockDriverState *bs, const char* 
> filename, int flags)
>      int ret;
>      int err = -EINVAL;
>
> -    file = qemu_strdup(filename);
> +    file = qemu_strdup(s->filename);
>
>      name = strstr(file, EN_OPTSTR);
>      if (name) {
> @@ -121,32 +123,62 @@ out:
>      return err;
>  }
>
> +// Puts the filename into the driver state and calls nbd_open - hanging until
> +// it is successful.

Please use /* */ comments instead, QEMU coding style.

> +static int nbd_setup(BlockDriverState *bs, const char* filename, int flags)
> +{
> +    BDRVNBDState *s = bs->opaque;
> +    int err = 1;
> +
> +    s->filename = qemu_strdup(filename);
> +    while (err != 0)
> +    {
> +        err = nbd_open(bs);
> +        // Avoid tight loops
> +        if (err != 0)
> +            sleep(1);

Please use {} even for single statement blocks.

> +    }
> +
> +    return err;
> +}

Waiting until the connection succeeds is a policy decision.  It might
be equally useful to abort starting QEMU or to start with the nbd
device unavailable (either returning I/O errors or not completing
requests until connection).  There should be a way to configure this,
otherwise we might get a user who depended on the old way having to
file a bug report.

> +
>  static int nbd_read(BlockDriverState *bs, int64_t sector_num,
>                      uint8_t *buf, int nb_sectors)
>  {
>      BDRVNBDState *s = bs->opaque;
>      struct nbd_request request;
>      struct nbd_reply reply;
> +    bool success = false;
>
>      request.type = NBD_CMD_READ;
>      request.handle = (uint64_t)(intptr_t)bs;
> -    request.from = sector_num * 512;;
> +    request.from = sector_num * 512;
>      request.len = nb_sectors * 512;
>
> -    if (nbd_send_request(s->sock, &request) == -1)
> -        return -errno;
> +    while (success == false)
> +    {
> +        if ( (nbd_send_request(s->sock, &request) == -1) ||
> +             (nbd_receive_reply(s->sock, &reply) == -1)     )
> +        {
> +            // We hang here until the TCP session is established
> +            close(s->sock);
> +            while(nbd_open(bs) != 0)
> +                sleep(1);
> +            continue;

This hangs the VM.  It will literally stop executing code and be
frozen.  Doing this without hanging the VM involves changing
nbd_read()/nbd_write() to work asynchronously.

Also, on ENOSPC QEMU can already pause the VM using a different
mechanism.  That may not be appropriate here but it could be
interesting to see how that code works.

> +        }
>
> -    if (nbd_receive_reply(s->sock, &reply) == -1)
> -        return -errno;
> +        if (reply.error !=0)
> +            return -reply.error;
>
> -    if (reply.error !=0)
> -        return -reply.error;
> +        if (reply.handle != request.handle)
> +            return -EIO;
>
> -    if (reply.handle != request.handle)
> -        return -EIO;
> +        // If reading the actual data fails, we retry the whole request
> +        if (nbd_wr_sync(s->sock, buf, request.len, 1) != request.len)
> +            continue;
>
> -    if (nbd_wr_sync(s->sock, buf, request.len, 1) != request.len)
> -        return -EIO;
> +        success = true;
> +    }
>
>      return 0;
>  }
> @@ -157,26 +189,39 @@ static int nbd_write(BlockDriverState *bs, int64_t 
> sector_num,
>      BDRVNBDState *s = bs->opaque;
>      struct nbd_request request;
>      struct nbd_reply reply;
> +    bool success = false;
>
>      request.type = NBD_CMD_WRITE;
>      request.handle = (uint64_t)(intptr_t)bs;
>      request.from = sector_num * 512;;
>      request.len = nb_sectors * 512;
>
> -    if (nbd_send_request(s->sock, &request) == -1)
> -        return -errno;
> +    while (success == false)
> +    {
> +        if ( (nbd_send_request(s->sock, &request) == -1) ||
> +             (nbd_wr_sync(s->sock, (uint8_t*)buf, request.len, 0) != 
> request.len) )
> +        {
> +            // We hang here until the TCP session is established
> +            close(s->sock);
> +            while(nbd_open(bs) != 0)
> +                sleep(1);
> +            continue;
> +        }
> +
> +        // We didn't get a reply from the write, so try again
> +        if (nbd_receive_reply(s->sock, &reply) == -1)
> +            continue;
>
> -    if (nbd_wr_sync(s->sock, (uint8_t*)buf, request.len, 0) != request.len)
> -        return -EIO;
> +        // Problem with the response itself
> +        if (reply.error !=0)
> +            return -reply.error;
>
> -    if (nbd_receive_reply(s->sock, &reply) == -1)
> -        return -errno;
> +        if (reply.handle != request.handle)
> +            return -EIO;
>
> -    if (reply.error !=0)
> -        return -reply.error;
> +        success = true;
> +    }
>
> -    if (reply.handle != request.handle)
> -        return -EIO;
>
>      return 0;
>  }
> @@ -191,6 +236,7 @@ static void nbd_close(BlockDriverState *bs)
>      request.from = 0;
>      request.len = 0;
>      nbd_send_request(s->sock, &request);
> +    qemu_free(s->filename);
>
>      close(s->sock);
>  }
> @@ -205,7 +251,7 @@ static int64_t nbd_getlength(BlockDriverState *bs)
>  static BlockDriver bdrv_nbd = {
>      .format_name     = "nbd",
>      .instance_size   = sizeof(BDRVNBDState),
> -    .bdrv_file_open  = nbd_open,
> +    .bdrv_file_open  = nbd_setup,
>      .bdrv_read               = nbd_read,
>      .bdrv_write              = nbd_write,
>      .bdrv_close              = nbd_close,



reply via email to

[Prev in Thread] Current Thread [Next in Thread]