[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before comple
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration |
Date: |
Wed, 18 Oct 2017 09:59:27 +0100 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
* Peter Xu (address@hidden) wrote:
> On Wed, Oct 11, 2017 at 08:13:13PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <address@hidden>
> >
> > Wait for a semaphore before completing the migration,
> > if the previously added capability was enabled.
> >
> > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> > ---
> > migration/migration.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> > migration/migration.h | 3 +++
> > 2 files changed, 50 insertions(+)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index e1a87c3d23..b411a7bb63 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1967,6 +1967,46 @@ fail:
> > }
> >
> > /**
> > + * migration_maybe_pause: Pause if required to by
> > migrate_pause_before_device
> > + * called with the iothread locked
> > + * Returns: 0 on success
> > + */
> > +static int migration_maybe_pause(MigrationState *s, int
> > *current_active_state)
> > +{
> > + int ret;
> > + if (!migrate_pause_before_device()) {
> > + return 0;
> > + }
> > + ret = bdrv_inactivate_all();
>
> My understanding is that the crash was caused by mirrored block device
> IO triggered after the inactivation, then... should we do this after
> waiting for the semaphore (possibly at [1] below) to make sure the
> block jobs are completed? Or did I miss anything?
Ah you're right, just confirmed this with kwolf that I got it the
wrong way around.
I'll fix it.
Dave
> > + if (ret) {
> > + error_report("%s: bdrv_inactivate_all() failed (%d)",
> > + __func__, ret);
> > + return ret;
> > + }
> > +
> > + s->block_inactive = true;
> > +
> > + /* Since leaving this state is not atomic with posting the semaphore
> > + * it's possible that someone could have issued multiple
> > migrate_continue
> > + * and the semaphore is incorrectly positive at this point;
> > + * the docs say it's undefined to reinit a semaphore that's already
> > + * init'd, so use timedwait to eat up any existing posts.
> > + */
> > + while (qemu_sem_timedwait(&s->pause_sem, 1) == 0);
> > +
> > + qemu_mutex_unlock_iothread();
> > + migrate_set_state(&s->state, *current_active_state,
> > + MIGRATION_STATUS_PAUSE_BEFORE_DEVICE);
> > + qemu_sem_wait(&s->pause_sem);
>
> [1]
>
> > + migrate_set_state(&s->state, MIGRATION_STATUS_PAUSE_BEFORE_DEVICE,
> > + MIGRATION_STATUS_DEVICE);
> > + *current_active_state = MIGRATION_STATUS_DEVICE;
> > + qemu_mutex_lock_iothread();
> > +
> > + return s->state == MIGRATION_STATUS_DEVICE ? 0 : -EINVAL;
> > +}
> > +
> > +/**
> > * migration_completion: Used by migration_thread when there's not much
> > left.
> > * The caller 'breaks' the loop when this returns.
> > *
> > @@ -1992,6 +2032,11 @@ static void migration_completion(MigrationState *s,
> > int current_active_state,
> > bool inactivate = !migrate_colo_enabled();
> > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > if (ret >= 0) {
> > + ret = migration_maybe_pause(s, ¤t_active_state);
> > + /* If this worked it will already have inactivated */
> > + inactivate &= !migrate_pause_before_device();
> > + }
> > + if (ret >= 0) {
> > qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
> > ret = qemu_savevm_state_complete_precopy(s->to_dst_file,
> > false,
> > inactivate);
> > @@ -2372,6 +2417,7 @@ static void migration_instance_finalize(Object *obj)
> >
> > g_free(params->tls_hostname);
> > g_free(params->tls_creds);
> > + qemu_sem_destroy(&ms->pause_sem);
> > }
> >
> > static void migration_instance_init(Object *obj)
> > @@ -2382,6 +2428,7 @@ static void migration_instance_init(Object *obj)
> > ms->state = MIGRATION_STATUS_NONE;
> > ms->xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE;
> > ms->mbps = -1;
> > + qemu_sem_init(&ms->pause_sem, 0);
> >
> > params->tls_hostname = g_strdup("");
> > params->tls_creds = g_strdup("");
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 37feea5453..447e8b3f79 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -121,6 +121,9 @@ struct MigrationState
> > /* Flag set once the migration thread called bdrv_inactivate_all */
> > bool block_inactive;
> >
> > + /* Migration is paused due to pause-before-device */
> > + QemuSemaphore pause_sem;
> > +
> > /* The semaphore is used to notify COLO thread that failover is
> > finished */
> > QemuSemaphore colo_exit_sem;
> >
> > --
> > 2.13.6
> >
>
> --
> Peter Xu
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] [PATCH 0/7] migration: pause-before-device, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 1/7] migration: Add 'pause-before-device' capability, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 2/7] migration: Add 'pause-before-device' and 'device' statuses, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 3/7] migration: Wait for semaphore before completing migration, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 4/7] migration: migrate-continue, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy, Dr. David Alan Gilbert (git), 2017/10/11
- Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device, no-reply, 2017/10/11
- Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device, Daniel P. Berrange, 2017/10/12