qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 02/17] replay: Fix migration replay_mutex locking


From: Fabiano Rosas
Subject: Re: [PATCH 02/17] replay: Fix migration replay_mutex locking
Date: Fri, 20 Dec 2024 10:08:10 -0300

Nicholas Piggin <npiggin@gmail.com> writes:

Hi Nick,

I'm ignorant about replay, but we try to know why were taking the BQL in
the migration code, we move it around sometimes, etc. Can we be a bit
more strict with documentation here so we don't get stuck with a lock
that can't be changed?

> Migration causes a number of events that need to go in the replay
> trace, such as vm state transitions. The replay_mutex lock needs to
> be held for these.
>

Is it practical to explicitly list which events are those?

Are there any tests that exercise this that we could use to validate
changes around this area?

> The simplest approach seems to be just take it up-front when taking
> the bql.

But also the thing asserts if taken inside the BQL, so is the actual
matter here that we _cannot_ take the lock around the proper places?

I also see the replay lock around the main loop, so is it basically bql2
from the perspective of most of QEMU?

>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  migration/migration.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 2eb9e50a263..277fca954c1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -24,6 +24,7 @@
>  #include "socket.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/replay.h"
>  #include "sysemu/cpu-throttle.h"
>  #include "rdma.h"
>  #include "ram.h"
> @@ -2518,6 +2519,7 @@ static int postcopy_start(MigrationState *ms, Error 
> **errp)
>      }
>  
>      trace_postcopy_start();
> +    replay_mutex_lock();
>      bql_lock();
>      trace_postcopy_start_set_run();
>  
> @@ -2629,6 +2631,7 @@ static int postcopy_start(MigrationState *ms, Error 
> **errp)
>      migration_downtime_end(ms);
>  
>      bql_unlock();
> +    replay_mutex_unlock();
>  
>      if (migrate_postcopy_ram()) {
>          /*
> @@ -2670,6 +2673,7 @@ fail:
>      }
>      migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
>      bql_unlock();
> +    replay_mutex_unlock();
>      return -1;
>  }
>  
> @@ -2721,6 +2725,7 @@ static int migration_completion_precopy(MigrationState 
> *s,
>  {
>      int ret;
>  
> +    replay_mutex_lock();
>      bql_lock();
>  
>      if (!migrate_mode_is_cpr(s)) {
> @@ -2746,6 +2751,7 @@ static int migration_completion_precopy(MigrationState 
> *s,
>                                               s->block_inactive);
>  out_unlock:
>      bql_unlock();
> +    replay_mutex_unlock();
>      return ret;
>  }
>  
> @@ -3633,6 +3639,7 @@ static void *bg_migration_thread(void *opaque)
>  
>      trace_migration_thread_setup_complete();
>  
> +    replay_mutex_lock();
>      bql_lock();
>  
>      if (migration_stop_vm(s, RUN_STATE_PAUSED)) {
> @@ -3666,6 +3673,7 @@ static void *bg_migration_thread(void *opaque)
>       */
>      migration_bh_schedule(bg_migration_vm_start_bh, s);
>      bql_unlock();
> +    replay_mutex_unlock();
>  
>      while (migration_is_active()) {
>          MigIterateState iter_state = bg_migration_iteration_run(s);
> @@ -3695,6 +3703,7 @@ fail:
>          migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
>                  MIGRATION_STATUS_FAILED);
>          bql_unlock();
> +        replay_mutex_unlock();
>      }
>  
>  fail_setup:



reply via email to

[Prev in Thread] Current Thread [Next in Thread]