qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v2 06/12] mc: introduce state machine change


From: Michael R. Hines
Subject: Re: [Qemu-devel] [RFC PATCH v2 06/12] mc: introduce state machine changes for MC
Date: Fri, 04 Apr 2014 11:50:12 +0800
User-agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 03/12/2014 05:57 AM, Juan Quintela wrote:
address@hidden wrote:
From: "Michael R. Hines" <address@hidden>

This patch sets up the initial changes to the migration state
machine and prototypes to be used by the checkpointing code
to interact with the state machine so that we can later handle
failure and recovery scenarios.

Signed-off-by: Michael R. Hines <address@hidden>
---
  arch_init.c                   | 29 ++++++++++++++++++++++++-----
  include/migration/migration.h |  2 ++
  migration.c                   | 37 +++++++++++++++++++++----------------
  3 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index db75120..e9d4d9e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -658,13 +658,13 @@ static void ram_migration_cancel(void *opaque)
      migration_end();
  }
-static void reset_ram_globals(void)
+static void reset_ram_globals(bool reset_bulk_stage)
  {
      last_seen_block = NULL;
      last_sent_block = NULL;
      last_offset = 0;
      last_version = ram_list.version;
-    ram_bulk_stage = true;
+    ram_bulk_stage = reset_bulk_stage;
  }
#define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -674,6 +674,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
      RAMBlock *block;
      int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+ /*
+     * RAM stays open during micro-checkpointing for the next transaction.
+     */
+    if (migration_is_mc(migrate_get_current())) {
+        qemu_mutex_lock_ramlist();
+        reset_ram_globals(false);
+        goto skip_setup;
+    }
+
      migration_bitmap = bitmap_new(ram_pages);
      bitmap_set(migration_bitmap, 0, ram_pages);
      migration_dirty_pages = ram_pages;
@@ -710,12 +719,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
      qemu_mutex_lock_iothread();
      qemu_mutex_lock_ramlist();
      bytes_transferred = 0;
-    reset_ram_globals();
+    reset_ram_globals(true);
memory_global_dirty_log_start();
      migration_bitmap_sync();
      qemu_mutex_unlock_iothread();
+skip_setup:
+
      qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
QTAILQ_FOREACH(block, &ram_list.blocks, next) {
@@ -744,7 +755,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
      qemu_mutex_lock_ramlist();
if (ram_list.version != last_version) {
-        reset_ram_globals();
+        reset_ram_globals(true);
      }
ram_control_before_iterate(f, RAM_CONTROL_ROUND);
@@ -825,7 +836,15 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
      }
ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-    migration_end();
+
+    /*
+     * Only cleanup at the end of normal migrations
+     * or if the MC destination failed and we got an error.
+     * Otherwise, we are (or will soon be) in MIG_STATE_CHECKPOINTING.
+     */
+    if(!migrate_use_mc() || migration_has_failed(migrate_get_current())) {
+        migration_end();
+    }
qemu_mutex_unlock_ramlist();
      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);


I haven't looked at the code in detail, but what we have here is
esentially:


ram_save_complete()
{
    code not needed for mc
    common codo for migration and mc
    code not needed for mc
}

Similar code on ram_save_setup.  Yes, I know that there are some locking
issues here.


SHould we be able do do something like

__ram_save_complete()
{
     common code
}

mc_ram_save_complete()
{
     # Possible something else here
     __ram_save_complete()
}

rest_ram_save_complete()
{
     code not needed for mc
     __ram_save_complete()
     code not needed for mc
}

My problem here is that current code is already quite complex and
convoluted.  At some point we are going to need to change it to
something that is easier to understand?

Understood: So it looks like we needs some "accessor" function
pointers or something here, similar to the way Paolo suggested
that we breakout "before" and "after" iteration methods for
localhost migration and RDMA migration.

I'll cook something up and re-submit.


-enum {
-    MIG_STATE_ERROR = -1,
-    MIG_STATE_NONE,
-    MIG_STATE_SETUP,
-    MIG_STATE_CANCELLING,
-    MIG_STATE_CANCELLED,
-    MIG_STATE_ACTIVE,
-    MIG_STATE_COMPLETED,
-};
-
Here comes the code seen on the previous patch O:-)

-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+bool migration_is_active(MigrationState *s)
+{
+    return (s->state == MIG_STATE_ACTIVE) || s->state == MIG_STATE_SETUP
+            || s->state == MIG_STATE_CHECKPOINTING;
+}
The whole idea of moving MIG_STATE_* to this file was to "force" all
other users to use accessor functions.  This way we know what the others
expect frum us.

Acknowleged - I'll work on creating (or enhancing) the accessor
functions to avoid moving the flags again.

-    assert(s->state != MIG_STATE_ACTIVE);
+    assert(!migration_is_active(s));
I can understand that we want here MIG_STATE_CHECKPOINTING, but _SETUP?
Or it is a bug on upstream?

My fault, I think there is some merge breakage here when I started
walking through the diff. Ignore this one for now..........

- Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]