[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O
From: |
Daniel P. Berrange |
Subject: |
Re: [Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O |
Date: |
Fri, 26 Aug 2011 12:25:06 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Aug 26, 2011 at 11:59:28AM +0100, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" <address@hidden>
>
> There are two common cases where migrate_cancel is intended to be
> used
>
> 1. When migration is not converging due to an overactive
> guest and insufficient network bandwidth
> 2. When migration is stuck due a network outage, waiting
> for the TCP transmit timeout to occurr & return an I/O
> error for send()
>
> In the second case, if you attempt to use 'migrate_cancel' it
> will also get stuck. This can be seen by attempting to migrate
> to a QEMU which has been SIGSTOP'd
>
> $ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
> -monitor stdio -vnc :2 -incoming tcp:localhost:9000
> QEMU 0.14.1 monitor - type 'help' for more information
> (qemu)
> <Ctrl-Z>
> [1]+ Stopped
>
> And in another shell
>
> $ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
> -monitor stdio -vnc :1
> QEMU 0.14.1 monitor - type 'help' for more information
> (qemu) migrate -d tcp:localhost:9000
> (qemu) info migrate
> Migration status: active
> transferred ram: 416 kbytes
> remaining ram: 621624 kbytes
> total ram: 623040 kbytes
> (qemu) migrate_cancel
>
> This last command will never return, until the first QEMU is
> resumed. Looking at the stack trace in GDB you see
>
> #0 0x0000003a8320e4c2 in __libc_send (fd=10, buf=0x1bc7c70, n=19777,
> flags=0)
> at ../sysdeps/unix/sysv/linux/x86_64/send.c:28
> #1 0x000000000048fb1e in socket_write (s=<optimized out>, buf=<optimized
> out>, size=<optimized out>)
> at migration-tcp.c:39
> #2 0x000000000048eba4 in migrate_fd_put_buffer (opaque=0x1b76ad0,
> data=0x1bc7c70, size=19777)
> at migration.c:324
> #3 0x000000000048e442 in buffered_flush (s=0x1b76b90) at buffered_file.c:87
> #4 0x000000000048e4cf in buffered_close (opaque=0x1b76b90) at
> buffered_file.c:177
> #5 0x0000000000496d57 in qemu_fclose (f=0x1bbfc10) at savevm.c:479
> #6 0x000000000048f4ca in migrate_fd_cleanup (s=0x1b76ad0) at migration.c:291
> #7 0x000000000048f035 in do_migrate_cancel (mon=<optimized out>,
> qdict=<optimized out>,
> ret_data=<optimized out>) at migration.c:136[snip]
> [snip]
>
> The migration_fd_cleanup method is where the problem really starts.
> Specifically it does
>
> if (s->file) {
> DPRINTF("closing file\n");
> if (qemu_fclose(s->file) != 0) {
> ret = -1;
> }
> s->file = NULL;
> }
>
> if (s->fd != -1)
> close(s->fd);
>
> And gets stuck in the qemu_fclose() bit because that method (rightly) tries
> to flush all outstanding buffers before closing. Unfortunately while this is
> desirable when migration ends successfully, it is undesirable when we are
> failing/cancelling migration.
>
> It is hard to tell qemu_fclose() that it shouldn't flush buffers directly,
> so the alternative is to ensure that this method fails quickly when it
> attempts I/O. This is easily achieved, simply by closing 's->fd' before
> calling qemu_fclose.
>
> Signed-off-by: Daniel P. Berrange <address@hidden>
> ---
> migration.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/migration.c b/migration.c
> index f5959b4..a432c3b 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -286,6 +286,13 @@ int migrate_fd_cleanup(FdMigrationState *s)
>
> qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
>
> + if ((s->state == MIG_STATE_ERROR ||
> + s->state == MIG_STATE_CANCELLED) &&
> + s->fd != -1) {
> + close(s->fd);
> + s->fd = -1;
> + }
> +
> if (s->file) {
> DPRINTF("closing file\n");
> if (qemu_fclose(s->file) != 0) {
This approach results in 'socket_write' doing a send(-1) which
gets back a EBADF errno, causing the flush to abort. An alternative
approach is to make the migrate_fd_put_buffer short-circuit the
send() call by checking the migration state thus:
diff --git a/migration.c b/migration.c
index f5959b4..6448d0b 100644
--- a/migration.c
+++ b/migration.c
@@ -319,6 +319,11 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void
*data, size_t size)
FdMigrationState *s = opaque;
ssize_t ret;
+ if (s->state == MIG_STATE_ERROR ||
+ s->state == MIG_STATE_CANCELLED) {
+ return -EIO;
+ }
+
do {
ret = s->write(s, data, size);
} while (ret == -1 && ((s->get_error(s)) == EINTR));
I think I slightly prefer this second option, since it avoids the EBADF
scenario. Other opinions ?
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|