qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitma


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH RFC v2 8/8] migration: add migration/dirty-bitmap.c
Date: Fri, 13 Feb 2015 11:19:36 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 11.02.2015 00:33, John Snow wrote:
Peter Maydell: What's the right way to license a file as copied from a previous version? See below, please;

Max, Markus: ctrl+f "bdrv_get_device_name" and let me know what you think, if you would.

Juan, Amit, David: Copying migration maintainers.

On 01/27/2015 05:56 AM, Vladimir Sementsov-Ogievskiy wrote:
Live migration of dirty bitmaps. Only named dirty bitmaps are migrated.
If destination qemu is already containing a dirty bitmap with the same
name as a migrated bitmap, then their granularities should be the same,
otherwise the error will be generated. If destination qemu doesn't
contain such bitmap it will be created.

format:

1 byte: flags

[ 1 byte: node name size ] \  flags & DEVICE_NAME
[ n bytes: node name     ] /

[ 1 byte: bitmap name size ]       \
[ n bytes: bitmap name     ]       | flags & BITMAP_NAME
[ [ be64: granularity    ] ]  flags & GRANULARITY

[ 1 byte: bitmap enabled bit ] flags & ENABLED

[ be64: start sector      ] \ flags & (NORMAL_CHUNK | ZERO_CHUNK)
[ be32: number of sectors ] /

[ be64: buffer size ] \ flags & NORMAL_CHUNK
[ n bytes: buffer   ] /

The last chunk should contain flags & EOS. The chunk may skip device
and/or bitmap names, assuming them to be the same with the previous
chunk. GRANULARITY is sent with the first chunk for the bitmap. ENABLED
bit is sent in the end of "complete" stage of migration. So when
destination gets ENABLED flag it should deserialize_finish the bitmap
and set its enabled bit to corresponding value.

Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---
  include/migration/block.h |   1 +
  migration/Makefile.objs   |   2 +-
migration/dirty-bitmap.c | 606 ++++++++++++++++++++++++++++++++++++++++++++++
  vl.c                      |   1 +
  4 files changed, 609 insertions(+), 1 deletion(-)
  create mode 100644 migration/dirty-bitmap.c

diff --git a/include/migration/block.h b/include/migration/block.h
index ffa8ac0..566bb9f 100644
--- a/include/migration/block.h
+++ b/include/migration/block.h
@@ -14,6 +14,7 @@
  #ifndef BLOCK_MIGRATION_H
  #define BLOCK_MIGRATION_H

+void dirty_bitmap_mig_init(void);
  void blk_mig_init(void);
  int blk_mig_active(void);
  uint64_t blk_mig_bytes_transferred(void);

OK.

diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..9adfda9 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -6,5 +6,5 @@ common-obj-y += xbzrle.o
  common-obj-$(CONFIG_RDMA) += rdma.o
  common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o

-common-obj-y += block.o
+common-obj-y += block.o dirty-bitmap.o


OK.

diff --git a/migration/dirty-bitmap.c b/migration/dirty-bitmap.c
new file mode 100644
index 0000000..8621218
--- /dev/null
+++ b/migration/dirty-bitmap.c
@@ -0,0 +1,606 @@
+/*
+ * QEMU dirty bitmap migration
+ *
+ * derived from migration/block.c
+ *
+ * Author:
+ * Sementsov-Ogievskiy Vladimir <address@hidden>
+ *
+ * original copyright message:
+ * =====================================================================
+ * Copyright IBM, Corp. 2009
+ *
+ * Authors:
+ *  Liran Schour <address@hidden>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ * =====================================================================
+ */
+

Not super familiar with the right way to do licensing here; it's possible you may not need to copy the original here, but I'm not sure. You will want to make it clear what license applies to /your/ work, I think. Maybe Peter Maydell can clue us in.

+#include "block/block.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "migration/block.h"
+#include "migration/migration.h"
+#include "qemu/hbitmap.h"
+#include <assert.h>
+
+#define CHUNK_SIZE                       (1 << 20)
+
+#define DIRTY_BITMAP_MIG_FLAG_EOS           0x01
+#define DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK  0x02
+#define DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK    0x04
+#define DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME   0x08
+#define DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME   0x10
+#define DIRTY_BITMAP_MIG_FLAG_GRANULARITY   0x20
+#define DIRTY_BITMAP_MIG_FLAG_ENABLED       0x40
+/* flags should be <= 0xff */
+

We should give ourselves a little breathing room with the flags, since we've only got room for one more.
Ok. Will one more byte be enough?

+/* #define DEBUG_DIRTY_BITMAP_MIGRATION */
+
+#ifdef DEBUG_DIRTY_BITMAP_MIGRATION
+#define DPRINTF(fmt, ...) \
+    do { printf("dirty_migration: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+    do { } while (0)
+#endif
+
+typedef struct DirtyBitmapMigBitmapState {
+    /* Written during setup phase. */
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+    HBitmap *dirty_bitmap;

For my own sanity, I'd really prefer "bitmap" and "meta_bitmap" here; "dirty_bitmap" is often used as a synonym (outside of this file) to refer to the BdrvDirtyBitmap in general, so it's usage here can be somewhat confusing.

I'd appreciate "dirty_dirty_bitmap" as in your previous patch for consistency, or "meta_bitmap" as I recommend.

Ok
+    int64_t total_sectors;
+    uint64_t sectors_per_chunk;
+    QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
+
+    /* For bulk phase. */
+    bool bulk_completed;
+    int64_t cur_sector;
+    bool granularity_sent;
+
+    /* For dirty phase. */
+    int64_t cur_dirty;
+} DirtyBitmapMigBitmapState;
+
+typedef struct DirtyBitmapMigState {
+    int migration_enable;
+    QSIMPLEQ_HEAD(dbms_list, DirtyBitmapMigBitmapState) dbms_list;
+
+    bool bulk_completed;
+
+    /* for send_bitmap() */
+    BlockDriverState *prev_bs;
+    BdrvDirtyBitmap *prev_bitmap;
+} DirtyBitmapMigState;
+
+static DirtyBitmapMigState dirty_bitmap_mig_state;
+
+/* read name from qemu file:
+ * format:
+ * 1 byte : len = name length (<256)
+ * len bytes : name without last zero byte
+ *
+ * name should point to the buffer >= 256 bytes length
+ */
+static char *qemu_get_name(QEMUFile *f, char *name)
+{
+    int len = qemu_get_byte(f);
+    qemu_get_buffer(f, (uint8_t *)name, len);
+    name[len] = '\0';
+
+    DPRINTF("get name: %d %s\n", len, name);
+
+    return name;
+}
+

OK. Maybe these could be "qemu_put_string" or "qemu_get_string" and added to qemu-file.c so others can use them.
If no objections for sharing this format, I'll do it.

+/* write name to qemu file:
+ * format:
+ * same as for qemu_get_name
+ *
+ * maximum name length is 255
+ */
+static void qemu_put_name(QEMUFile *f, const char *name)
+{
+    int len = strlen(name);
+
+    DPRINTF("put name: %d %s\n", len, name);
+
+    assert(len < 256);
+    qemu_put_byte(f, len);
+    qemu_put_buffer(f, (const uint8_t *)name, len);
+}
+

OK.

+static void send_bitmap(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+                        uint64_t start_sector, uint32_t nr_sectors)
+{
+    BlockDriverState *bs = dbms->bs;
+    BdrvDirtyBitmap *bitmap = dbms->bitmap;
+    uint8_t flags = 0;
+    /* align for buffer_is_zero() */
+    uint64_t align = 4 * sizeof(long);
+    uint64_t buf_size =
+        (bdrv_dirty_bitmap_data_size(bitmap, nr_sectors) + align - 1) &
+        ~(align - 1);
+    uint8_t *buf = g_malloc0(buf_size);
+
+ bdrv_dirty_bitmap_serialize_part(bitmap, buf, start_sector, nr_sectors);
+
+    if (buffer_is_zero(buf, buf_size)) {
+        g_free(buf);
+        buf = NULL;
+        flags |= DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK;
+    } else {
+        flags |= DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK;
+    }
+
+    if (bs != dirty_bitmap_mig_state.prev_bs) {
+        dirty_bitmap_mig_state.prev_bs = bs;
+        flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
+    }
+
+    if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
+        dirty_bitmap_mig_state.prev_bitmap = bitmap;
+        flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
+    }

OK, so we use the current bs/bitmap under consideration to detect if we have switched context, and put the names in the stream when it happens. OK.

+
+    if (dbms->granularity_sent == 0) {
+        dbms->granularity_sent = 1;
+        flags |= DIRTY_BITMAP_MIG_FLAG_GRANULARITY;
+    }
+
+    DPRINTF("Enter send_bitmap"
+            "\n   flags:        %x"
+            "\n   start_sector: %" PRIu64
+            "\n   nr_sectors:   %" PRIu32
+            "\n   data_size:    %" PRIu64 "\n",
+            flags, start_sector, nr_sectors, buf_size);
+
+    qemu_put_byte(f, flags);
+
+    if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
+        qemu_put_name(f, bdrv_get_device_name(bs));
+    }

I am still not fully clear on this myself, but I think we are phasing out bdrv_get_device_name. In the context of bitmaps, we are mostly likely using them one-per-tree, but they /can/ be attached one-per-node, so we shouldn't be trying to get the device name here, but rather, the node-name.

I /think/ we may want to be using bdrv_get_node_name, but I think it is not currently true that all nodes WILL be named ... I am CC'ing Markus Armbruster and Max Reitz for (A) A refresher course and (B) Opinions on what function call might make sense here, given that we wish to migrate bitmaps attached to arbitrary nodes.
Hmm.. I'm not familiar with hierarchy of nodes and devices. As I understand, both command_line- and qmp- created drives are created through blockdev_init, which creates both blk(device) and bs(node) through blk_new_with_bs.. Am I right? Also, bdrv_get_device_name is used in migration/block.c.

+
+    if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
+        qemu_put_name(f, bdrv_dirty_bitmap_name(bitmap));
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
+ qemu_put_be64(f, bdrv_dirty_bitmap_granularity(bs, bitmap));
+        }
+    } else {
+        assert(!(flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY));
+    }
+

I thought we were only migrating bitmaps with names?
I suppose the conditional can't hurt, but I am not clear on when we won't have a bitmap name here.
You are right, 'else' case is not possible.. Hmm. I've added it to be sure that format is not corrupted, when I decided to put granularity only with name. Wi won't have a bitmap name only when we send the same bitmap as on the previous send_bitmap() call. May be it will be better to use two separate if's without else and assert.

+    qemu_put_be64(f, start_sector);
+    qemu_put_be32(f, nr_sectors);
+
+    /* if a block is zero we need to flush here since the network
+     * bandwidth is now a lot higher than the storage device bandwidth.
+     * thus if we queue zero blocks we slow down the migration.
+     * also, skip writing block when migrate only dirty bitmaps. */
+    if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
+        qemu_fflush(f);
+        return;
+    }
+
+    qemu_put_be64(f, buf_size);
+    qemu_put_buffer(f, buf, buf_size);
+    g_free(buf);
+}
+
+
+/* Called with iothread lock taken.  */
+
+static void set_dirty_tracking(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        dbms->dirty_bitmap =
+            bdrv_create_dirty_dirty_bitmap(dbms->bitmap, CHUNK_SIZE);
+    }
+}
+

OK: so we only have these dirty-dirty bitmaps when migration is starting, which makes sense.

+static void unset_dirty_tracking(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        bdrv_release_dirty_dirty_bitmap(dbms->bitmap);
+    }
+}
+

OK.

+static void init_dirty_bitmap_migration(QEMUFile *f)
+{
+    BlockDriverState *bs;
+    BdrvDirtyBitmap *bitmap;
+    DirtyBitmapMigBitmapState *dbms;
+
+    dirty_bitmap_mig_state.bulk_completed = false;
+    dirty_bitmap_mig_state.prev_bs = NULL;
+    dirty_bitmap_mig_state.prev_bitmap = NULL;
+
+    for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
+        for (bitmap = bdrv_next_dirty_bitmap(bs, NULL); bitmap;
+             bitmap = bdrv_next_dirty_bitmap(bs, bitmap)) {
+            if (!bdrv_dirty_bitmap_name(bitmap)) {
+                continue;
+            }
+
+            dbms = g_new0(DirtyBitmapMigBitmapState, 1);
+            dbms->bs = bs;
+            dbms->bitmap = bitmap;
+            dbms->total_sectors = bdrv_nb_sectors(bs);
+            dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
+                bdrv_dirty_bitmap_granularity(dbms->bs, dbms->bitmap)
+                >> BDRV_SECTOR_BITS;
+
+ QSIMPLEQ_INSERT_TAIL(&dirty_bitmap_mig_state.dbms_list,
+                                 dbms, entry);
+        }
+    }
+}
+

OK, but see the note below for dirty_bitmap_mig_init.
actually it is not 'init' but 'reinit' - called on every migration start.. Hmm. dbms_list should be cleared here before fill it again.

+/* Called with no lock taken.  */
+static void bulk_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    uint32_t nr_sectors = MIN(dbms->total_sectors - dbms->cur_sector,
+                             dbms->sectors_per_chunk);

What about cases where nr_sectors will put us past the end of the bitmap? The bitmap serialization implementation might need a touchup with this in mind.
I don't understand.. nr_sectors <= dbms->total_sectors - dbms->cur_sector and it can't put us past the end...

+
+    send_bitmap(f, dbms, dbms->cur_sector, nr_sectors);
+
+    dbms->cur_sector += nr_sectors;
+    if (dbms->cur_sector >= dbms->total_sectors) {
+        dbms->bulk_completed = true;
+    }
+}
+
+/* Called with no lock taken.  */
+static void bulk_phase(QEMUFile *f, bool limit)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        while (!dbms->bulk_completed) {
+            bulk_phase_send_chunk(f, dbms);
+            if (limit && qemu_file_rate_limit(f)) {
+                return;
+            }
+        }
+    }
+
+    dirty_bitmap_mig_state.bulk_completed = true;
+}

OK.

+
+static void blk_mig_reset_dirty_cursor(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        dbms->cur_dirty = 0;
+    }
+}
+

OK.

+/* Called with iothread lock taken.  */
+static void dirty_phase_send_chunk(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+{
+    uint32_t nr_sectors;
+
+    while (dbms->cur_dirty < dbms->total_sectors &&
+           !hbitmap_get(dbms->dirty_bitmap, dbms->cur_dirty)) {
+        dbms->cur_dirty += dbms->sectors_per_chunk;
+    }

OK, so we fast forward the dirty cursor while the meta-bitmap is empty. Is it not worth using the HBitmapIterator here? You can reset them everywhere you reset the dirty cursor, and then just fast-seek to the first dirty sector.
Yes, I've thought about it, just used simpler way (copied from migration/block.c) for an early version of the patch set. I will do it.

+
+    if (dbms->cur_dirty >= dbms->total_sectors) {
+        return;
+    }
+
+    nr_sectors = MIN(dbms->total_sectors - dbms->cur_dirty,
+                     dbms->sectors_per_chunk);

What happens if nr_sectors goes past the end?

+    send_bitmap(f, dbms, dbms->cur_dirty, nr_sectors);
+ hbitmap_reset(dbms->dirty_bitmap, dbms->cur_dirty, dbms->sectors_per_chunk);
+    dbms->cur_dirty += nr_sectors;
+}
+
+/* Called with iothread lock taken.
+ *
+ * return value:
+ * 0: too much data for max_downtime
+ * 1: few enough data for max_downtime
+*/

dirty_phase below doesn't have a return value.
rudimentary comment.. thanks.

+static void dirty_phase(QEMUFile *f, bool limit)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        while (dbms->cur_dirty < dbms->total_sectors) {
+            dirty_phase_send_chunk(f, dbms);
+            if (limit && qemu_file_rate_limit(f)) {
+                return;
+            }
+        }
+    }
+}
+

OK.

+
+/* Called with iothread lock taken.  */
+static void dirty_bitmap_mig_cleanup(void)
+{
+    DirtyBitmapMigBitmapState *dbms;
+
+    unset_dirty_tracking();
+
+ while ((dbms = QSIMPLEQ_FIRST(&dirty_bitmap_mig_state.dbms_list)) != NULL) {
+ QSIMPLEQ_REMOVE_HEAD(&dirty_bitmap_mig_state.dbms_list, entry);
+        g_free(dbms);
+    }
+}
+

OK.

+static void dirty_bitmap_migration_cancel(void *opaque)
+{
+    dirty_bitmap_mig_cleanup();
+}
+

OK.

+static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
+{
+    DPRINTF("Enter save live iterate\n");
+
+    blk_mig_reset_dirty_cursor();

I suppose this is because it's easier to check if we are finished by starting from sector 0 every time.

A harder, but faster method may be: Use HBitmapIterators, but don't reset them every iteration: just iterate until the end, and check that the bitmap is empty. If the meta bitmap is empty, the dirty phase is complete. If the meta bitmap is NOT empty, reset the HBI and continue allowing iterations over the dirty phase.
Ok, will do.

+
+    if (dirty_bitmap_mig_state.bulk_completed) {
+        qemu_mutex_lock_iothread();
+        dirty_phase(f, true);
+        qemu_mutex_unlock_iothread();
+    } else {
+        bulk_phase(f, true);
+    }
+
+    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    return dirty_bitmap_mig_state.bulk_completed;
+}
+
+/* Called with iothread lock taken.  */
+
+static int dirty_bitmap_save_complete(QEMUFile *f, void *opaque)
+{
+    DirtyBitmapMigBitmapState *dbms;
+    DPRINTF("Enter save live complete\n");
+
+    if (!dirty_bitmap_mig_state.bulk_completed) {
+        bulk_phase(f, false);
+    }

[Not expertly familiar with savevm:] Under what conditions can this happen?
This can happen. save_complete will happen when savevm decide that pending data size to send is small enough. It was the case for my bugfix for migration/block.c about pending. To prevent save_complete when bulk phase isn't completed, save_pending returns (in my bugfix for migration/block.c) big value. Here I decided to make more honest save_pending, so I need to complete (if it doesn't) bulk phase in save_complete.

+
+    blk_mig_reset_dirty_cursor();
+    dirty_phase(f, false);
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        uint8_t flags = DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME |
+                        DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME |
+                        DIRTY_BITMAP_MIG_FLAG_ENABLED;
+
+        qemu_put_byte(f, flags);
+        qemu_put_name(f, bdrv_get_device_name(dbms->bs));
+        qemu_put_name(f, bdrv_dirty_bitmap_name(dbms->bitmap));
+        qemu_put_byte(f, bdrv_dirty_bitmap_enabled(dbms->bitmap));
+    }
+
+    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    DPRINTF("Dirty bitmaps migration completed\n");
+
+    dirty_bitmap_mig_cleanup();
+    return 0;
+}
+

I suppose we don't need a flag that distinctly SAYS this is the end section, since we can tell by omission of DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK or ZERO_CHUNK.
Hmm. I think it simplifies the logic (to use EOS after each section). And the same approach is in migration/block.c.. It's a question about which format is better: "Each section for dirty_bitmap_load ends with EOS" or "Each section for dirty_bitmap_load ends with EOS except the last one. The last one may be recognized by absent NORMAL_CHUNK and ZERO_CHUNK"

+static uint64_t dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
+                                          uint64_t max_size)
+{
+    DirtyBitmapMigBitmapState *dbms;
+    uint64_t pending = 0;
+
+    qemu_mutex_lock_iothread();
+
+    QSIMPLEQ_FOREACH(dbms, &dirty_bitmap_mig_state.dbms_list, entry) {
+        uint64_t sectors = hbitmap_count(dbms->dirty_bitmap);
+        if (!dbms->bulk_completed) {
+            sectors += dbms->total_sectors - dbms->cur_sector;
+        }
+        pending += bdrv_dirty_bitmap_data_size(dbms->bitmap, sectors);
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    DPRINTF("Enter save live pending %" PRIu64 ", max: %" PRIu64 "\n",
+            pending, max_size);
+    return pending;
+}
+

OK.

+static int dirty_bitmap_load(QEMUFile *f, void *opaque, int version_id)
+{
+    int flags;
+
+    static char device_name[256], bitmap_name[256];
+    static BlockDriverState *bs;
+    static BdrvDirtyBitmap *bitmap;
+
+    uint8_t *buf;
+    uint64_t first_sector;
+    uint32_t  nr_sectors;
+    int ret;
+
+    DPRINTF("load start\n");
+
+    do {
+        flags = qemu_get_byte(f);
+        DPRINTF("flags: %x\n", flags);
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME) {
+            qemu_get_name(f, device_name);
+            bs = bdrv_find(device_name);

Similar to the above confusion, you may want bdrv_lookup_bs or similar, since we're going to be looking for BDS nodes instead of "devices."
In this case, should it be changed in migration/block.c too?

+            if (!bs) {
+                fprintf(stderr, "Error: unknown block device '%s'\n",
+                        device_name);
+                return -EINVAL;
+            }
+        }
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
+            if (!bs) {
+ fprintf(stderr, "Error: block device name is not set\n");
+                return -EINVAL;
+            }
+
+            qemu_get_name(f, bitmap_name);
+            bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
+            if (flags & DIRTY_BITMAP_MIG_FLAG_GRANULARITY) {
+                /* First chunk from this bitmap */
+                uint64_t granularity = qemu_get_be64(f);
+                if (!bitmap) {
+                    Error *local_err = NULL;
+                    bitmap = bdrv_create_dirty_bitmap(bs, granularity,
+ bitmap_name,
+ &local_err);
+                    if (!bitmap) {
+ error_report("%s", error_get_pretty(local_err));
+                        error_free(local_err);
+                        return -EINVAL;
+                    }
+                } else {
+                    uint64_t dest_granularity =
+                        bdrv_dirty_bitmap_granularity(bs, bitmap);
+                    if (dest_granularity != granularity) {
+                        fprintf(stderr,
+                                "Error: "
+ "Migrated bitmap granularity (%" PRIu64 ") " + "is not match with destination bitmap '%s' "
+                                "granularity (%" PRIu64 ")\n",
+                                granularity,
+                                bitmap_name,
+                                dest_granularity);
+                        return -EINVAL;
+                    }
+                }
+                bdrv_disable_dirty_bitmap(bitmap);
+            }
+            if (!bitmap) {
+                fprintf(stderr, "Error: unknown dirty bitmap "
+                        "'%s' for block device '%s'\n",
+                        bitmap_name, device_name);
+                return -EINVAL;
+            }
+        }
+
+        if (flags & DIRTY_BITMAP_MIG_FLAG_ENABLED) {
+            bool enabled;
+            if (!bitmap) {
+ fprintf(stderr, "Error: dirty bitmap name is not set\n");
+                return -EINVAL;
+            }
+            bdrv_dirty_bitmap_deserialize_finish(bitmap);
+            /* complete migration */
+            enabled = qemu_get_byte(f);
+            if (enabled) {
+                bdrv_enable_dirty_bitmap(bitmap);
+            }
+        }

Oh, so you use the ENABLED flag to show that migration is over.
Yes, it was bad idea..
If we are going to commit to a stream format for bitmaps, though, maybe it's best to actually create a "COMPLETION BLOCK" flag and then split this function into two pieces:

(1) The part that receives regular / zero blocks, and
(2) The part that receives completion data.

That way, if we change the properties that bitmaps have down the line, we aren't reliant on literally the "enabled" flag to decide what to do.

Also, it might help make this fairly long function a little smaller and more readable.
Ok.

+
+        if (flags & (DIRTY_BITMAP_MIG_FLAG_NORMAL_CHUNK |
+                     DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK)) {
+            if (!bs) {
+ fprintf(stderr, "Error: block device name is not set\n");
+                return -EINVAL;
+            }
+            if (!bitmap) {
+ fprintf(stderr, "Error: dirty bitmap name is not set\n");
+                return -EINVAL;
+            }
+
+            first_sector = qemu_get_be64(f);
+            nr_sectors = qemu_get_be32(f);
+            DPRINTF("chunk: %lu %u\n", first_sector, nr_sectors);
+
+
+            if (flags & DIRTY_BITMAP_MIG_FLAG_ZERO_CHUNK) {
+ bdrv_dirty_bitmap_deserialize_part0(bitmap, first_sector,
+ nr_sectors);
+            } else {
+                uint64_t buf_size = qemu_get_be64(f);
+                uint64_t needed_size =
+                    bdrv_dirty_bitmap_data_size(bitmap, nr_sectors);
+
+                if (needed_size > buf_size) {
+                    fprintf(stderr,
+ "Error: Migrated bitmap granularity is not " + "match with destination bitmap granularity\n");
+                    return -EINVAL;
+                }
+

"Migrated bitmap granularity doesn't match the destination bitmap granularity" perhaps.

+                buf = g_malloc(buf_size);
+                qemu_get_buffer(f, buf, buf_size);
+                bdrv_dirty_bitmap_deserialize_part(bitmap, buf,
+ first_sector,
+                                                   nr_sectors);
+                g_free(buf);
+            }
+        }
+
+        ret = qemu_file_get_error(f);
+        if (ret != 0) {
+            return ret;
+        }
+    } while (!(flags & DIRTY_BITMAP_MIG_FLAG_EOS));
+
+    DPRINTF("load finish\n");
+    return 0;
+}
+
+static void dirty_bitmap_set_params(const MigrationParams *params, void *opaque)
+{
+    dirty_bitmap_mig_state.migration_enable = params->dirty;
+}
+

OK; though I am not immediately aware of what changes need to happen to accommodate Eric's suggestions.
This function will be dropped in v3.

+static bool dirty_bitmap_is_active(void *opaque)
+{
+    return dirty_bitmap_mig_state.migration_enable == 1;
+}
+

OK.

+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+{
+    init_dirty_bitmap_migration(f);
+
+    qemu_mutex_lock_iothread();
+    /* start track dirtyness of dirty bitmaps */
+    set_dirty_tracking();
+    qemu_mutex_unlock_iothread();
+
+    blk_mig_reset_dirty_cursor();
+    qemu_put_byte(f, DIRTY_BITMAP_MIG_FLAG_EOS);
+
+    return 0;
+}
+

OK; see dirty_bitmap_mig_init below, though.

+static SaveVMHandlers savevm_block_handlers = {
+    .set_params = dirty_bitmap_set_params,
+    .save_live_setup = dirty_bitmap_save_setup,
+    .save_live_iterate = dirty_bitmap_save_iterate,
+    .save_live_complete = dirty_bitmap_save_complete,
+    .save_live_pending = dirty_bitmap_save_pending,
+    .load_state = dirty_bitmap_load,
+    .cancel = dirty_bitmap_migration_cancel,
+    .is_active = dirty_bitmap_is_active,
+};
+
+void dirty_bitmap_mig_init(void)
+{
+    QSIMPLEQ_INIT(&dirty_bitmap_mig_state.dbms_list);

Maybe I haven't looked thoroughly enough yet, but it's weird that part of the dirty_bitmap_mig_state is initialized here, and the rest of it in init_dirty_bitmap_migration. I'd prefer to keep it all together, if possible.
dirty_bitmap_mig_init is called one time when qemu starts. QSIMPLEQ_INIT should be called once. dirty_bitmap_save_setup is called on every migration start, it's like 'reinitialize'.

+
+ register_savevm_live(NULL, "dirty-bitmap", 0, 1, &savevm_block_handlers,
+                         &dirty_bitmap_mig_state);
+}

OK.

diff --git a/vl.c b/vl.c
index a824a7d..dee7220 100644
--- a/vl.c
+++ b/vl.c
@@ -4184,6 +4184,7 @@ int main(int argc, char **argv, char **envp)

      blk_mig_init();
      ram_mig_init();
+    dirty_bitmap_mig_init();

/* If the currently selected machine wishes to override the units-per-bus
       * property of its default HBA interface type, do so now. */


Hm, since dirty bitmaps are a sub-component of the block layer, would it not make sense to put this hook under blk_mig_init, perhaps?
IMHO the reason to put it here is to keep all register_savevm_live-entities in one place.


Overall this looks very clean compared to the intermingled format in V1, and the code is organized pretty well. Just a few minor comments, and I'd like to get the opinion of the migration maintainers, but I am happy. Sorry it took me so long to review, please feel free to let me know if you disagree with any of my opinions :)

Thank you,
--John

Thank you for reviewing my series)

--
Best regards,
Vladimir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]