qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-devel [RFC] [WIP] v2] Keeping the Source side ali


From: Eric Blake
Subject: Re: [Qemu-devel] [Qemu-devel [RFC] [WIP] v2] Keeping the Source side alive incase of network failure (Migration recovery from network failure)
Date: Wed, 8 Jun 2016 14:41:16 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 06/08/2016 12:13 PM, Md Haris Iqbal wrote:

The subject line is long, and has a typo (s/incase/in case/).  Also, the
mailing list automatically prepends [Qemu-devel], so you shouldn't
repeat it manually.  Better might have been a short subject line then a
longer commit body:

migration: keep source alive on network failure

Details about what was failing, and why this code improves it

Missing a Signed-off-by: attribution; without that, we can't take it.

> ---

You marked this patch as v2, but in the same minute sent another email
with subject line v1, and didn't say what changed to need a v2. Here
after the --- divider is a good place for that.

>  include/migration/migration.h |  1 +
>  migration/migration.c         | 76 
> ++++++++++++++++++++++++++++++++++++++++---
>  qapi-schema.json              | 11 +++++--
>  vl.c                          |  4 +++
>  4 files changed, 85 insertions(+), 7 deletions(-)
> 

> @@ -1726,11 +1755,32 @@ static void *migration_thread(void *opaque)
>              }
>          }
>  
> -        if (qemu_file_get_error(s->to_dst_file)) {
> -            migrate_set_state(&s->state, current_active_state,
> -                              MIGRATION_STATUS_FAILED);
> -            trace_migration_thread_file_err();
> -            break;
> +        if ((ret = qemu_file_get_error(s->to_dst_file))) {
> +            fprintf(stderr, "1 : Error %s %d\n", strerror(-ret), -ret);

fprintf() is rather awkward for errors; can we use qemu's Error mechanism?

> +
> +            /*  This check is based on how the error is set during the 
> network
> +             *  recv(). When recv() returns 0 (i.e. no data to read), the 
> error
> +             *  is set to -EIO. For all other network errors, it is set
> +             *  according to the return value received.
> +             */
> +            if (ret != -EIO && s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) 
> {
> +                /* Network Failure during postcopy */
> +
> +                current_active_state = MIGRATION_STATUS_POSTCOPY_RECOVERY;
> +                runstate_set(RUN_STATE_POSTMIGRATE_RECOVERY);
> +                fprintf(stderr, "1.1 : Error %s %d\n", strerror(-ret), -ret);

Does the end user really need to see "1.1 :"


> +++ b/qapi-schema.json
> @@ -154,12 +154,14 @@
>  # @watchdog: the watchdog action is configured to pause and has been 
> triggered
>  #
>  # @guest-panicked: guest has been panicked as a result of guest OS panic
> +#
> +# @postmigrate-recovery: guest is paused for recovery after a network failure

Not your fault that the overall enum is missing an overall line:

# Since: 1.4

nor that guest-panicked is missing a "(since 1.5)" hint, but at least
your addition should have a "(since 2.7)" hint.

>  ##
>  { 'enum': 'RunState',
>    'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>              'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>              'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
> -            'guest-panicked' ] }
> +            'guest-panicked', 'postmigrate-recovery' ] }

Adding new enums can cause existing clients like libvirt to do weird
things if they aren't expecting the new state. Are we sure we want to do
it?  Is it a state that cannot be entered by default, but only in
response to a client request that proves the client is new enough to
expect the new state?

>  
>  ##
>  # @StatusInfo:
> @@ -434,12 +436,15 @@
>  #
>  # @failed: some error occurred during migration process.
>  #
> +# @postcopy-recovery: in recovery mode, after a network failure.
> +#

Missing a "(since 2.7)" hint.

>  # Since: 2.3
>  #
>  ##
>  { 'enum': 'MigrationStatus',
>    'data': [ 'none', 'setup', 'cancelling', 'cancelled',
> -            'active', 'postcopy-active', 'completed', 'failed' ] }
> +            'active', 'postcopy-active', 'completed', 'failed',
> +            'postcopy-recovery' ] }
>  
>  ##
>  # @MigrationInfo
> @@ -2058,6 +2063,8 @@
>  #
>  # @uri: the Uniform Resource Identifier of the destination VM
>  #
> +# @recover: #optional recover from a broken migration
> +#

I don't see any 'recover' parameter added to the 'migrate' command to
match this added documentation.

>  # @blk: #optional do block migration (full disk copy)
>  #
>  # @inc: #optional incremental disk copy migration
> diff --git a/vl.c b/vl.c
> index 5fd22cb..c237140 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -618,6 +618,10 @@ static const RunStateTransition 
> runstate_transitions_def[] = {
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
>      { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
> +    { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE_RECOVERY },
> +
> +    { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_FINISH_MIGRATE },
> +    { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_SHUTDOWN },
>  
>      { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
>      { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]