qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] migration: add incremental drive-mirror and blockdev-mi


From: Daniel Kučera
Subject: Re: [Qemu-devel] migration: add incremental drive-mirror and blockdev-mirror with dirtymap
Date: Tue, 9 May 2017 09:35:23 +0200

2017-05-08 19:29 GMT+02:00 John Snow <address@hidden>:

>
>
> On 05/04/2017 03:45 AM, Daniel Kučera wrote:
> >
> > 2017-05-04 1:44 GMT+02:00 John Snow <address@hidden
> > <mailto:address@hidden>>:
> >
> >
> >
> >     On 05/03/2017 03:56 AM, Daniel Kučera wrote:
> >     > Hi all,
> >     >
> >     > this patch adds possibility to start mirroring since specific
> dirtyblock
> >     > bitmap.
> >     > The use-case is, for live migrations with ZFS volume used as block
> device:
> >     > 1. make dirtyblock bitmap in qemu
> >
> >     A "block dirty bitmap," I assume you mean. Through which interface?
> >     "block dirty bitmap add" via QMP?
> >
> >
> > Yes.
> >
> >
> >     > 2. make ZFS volume snapshot
> >
> >     ZFS Volume Snapshot is going to be a filesystem-level operation,
> isn't
> >     it? That is, creating this snapshot will necessarily create new dirty
> >     sectors, yes?
> >
> >
> > No, we are using "zfs volumes" which are block devices (similar to LVM)
> >
> > # blockdev --report /dev/zstore/storage4
> > RO    RA   SSZ   BSZ   StartSec            Size   Device
> > rw   256   512  4096          0     42949672960   /dev/zstore/storage4
> >
> > -drive
> > file=/dev/zstore/storage4,format=raw,if=none,id=drive-
> scsi0-0-0-0,cache=none,discard=unmap
> >
> >
>
> Ah, I see. Clearly I don't know much about ZFS in practice.
>
> >     > 3. zfs send/receive the snapshot to target machine
> >
> >     Why? Is this an attempt to make the process faster?
> >
> >
> > This preserves and transfers the whole chain of snapshots to destination
> > host, not only current state as it would be with drive-mirror sync: full
> >
> >
> >
> >     > 4. start mirroring to destination with block map from step 1
> >
> >     This makes me a little nervous: what guarantees do you have that the
> >     bitmap and the ZFS snapshot were synchronous?
> >
> >
> > It doesn't have to be synchronous (or atomic). The "block dirty bitmap"
> > just needs to be done prior to ZFS snapshot. Those few writes done in
> > the meantime don't hurt to be done twice.
> >
>
> Hm, I suppose that's right, pending cache issues, perhaps?
>
> (1) Write occurs; cached
> (2) Bitmap is added
> (3) Write occurs, cached
> (4) ZFS snapshot is taken
> (5) Data is flushed to backing storage.
>
> Now, the ZFS snapshot is migrated, but is missing the writes that
> occurred in (1) and (3).
>
> Next, we mirror the data in the bitmap, but it only includes the data
> from (3).
>
> The target now appears to be missing the write from (1) -- maybe,
> depending on how the volume snapshot occurs.
>

Yes, that's why I'm using cache=none. Libvirt doesn't allow you to migrate
VM which uses drive cache anyway (unless you specify flag unsafe).


>
> >
> >
> >     > 5. live migrate VM state to destination
> >     >
> >     > The point is, that I'm not able to live stream zfs changed data to
> >     > destination
> >     > to ensure same volume state in the moment of switchover of
> migrated VM
> >     > to the new hypervisor.
> >
> >     I'm a little concerned about the mixing of filesystem and block level
> >     snapshots...
> >
> >
> > As I explained above, ZFS snapshots are also block level.
> >
> >
> >
> >     >
> >     >
> >     > From 7317d731d51c5d743d7a4081b368f0a6862856d7 Mon Sep 17 00:00:00
> 2001
> >
> >     What happened to your timestamp?
> >
> >     > From: Daniel Kucera <address@hidden <mailto:
> address@hidden>>
> >     > Date: Tue, 2 May 2017 15:00:39 +0000
> >     > Subject: [PATCH] migration: add incremental drive-mirror and
> blockdev-mirror
> >
> >     Your actual email subject here, however, is missing the [PATCH] tag,
> >     which is useful for it getting picked up by the patchew build bot.
> >
> >     >  with dirtymap added parameter bitmap which will be used instead
> >     of newly
> >     >  created dirtymap in mirror_start_job
> >     >
> >     > Signed-off-by: Daniel Kucera <address@hidden
> >     <mailto:address@hidden>>
> >     > ---
> >     >  block/mirror.c            | 41
> >     ++++++++++++++++++++++++-----------------
> >     >  blockdev.c                |  6 +++++-
> >     >  include/block/block_int.h |  4 +++-
> >     >  qapi/block-core.json      | 12 ++++++++++--
> >     >  4 files changed, 42 insertions(+), 21 deletions(-)
> >     >
> >     > diff --git a/block/mirror.c b/block/mirror.c
> >     > index 9f5eb69..02b2f69 100644
> >     > --- a/block/mirror.c
> >     > +++ b/block/mirror.c
> >     > @@ -49,7 +49,7 @@ typedef struct MirrorBlockJob {
> >     >      BlockDriverState *to_replace;
> >     >      /* Used to block operations on the drive-mirror-replace
> target */
> >     >      Error *replace_blocker;
> >     > -    bool is_none_mode;
> >     > +    MirrorSyncMode sync_mode;
> >     >      BlockMirrorBackingMode backing_mode;
> >     >      BlockdevOnError on_source_error, on_target_error;
> >     >      bool synced;
> >     > @@ -523,7 +523,9 @@ static void mirror_exit(BlockJob *job, void
> >     *opaque)
> >     >      bdrv_child_try_set_perm(mirror_top_bs->backing, 0,
> BLK_PERM_ALL,
> >     >                              &error_abort);
> >     >      if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
> >     > -        BlockDriverState *backing = s->is_none_mode ? src :
> s->base;
> >     > +        BlockDriverState *backing =
> >     > +        (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) ||
> >     > +        (s->sync_mode == MIRROR_SYNC_MODE_NONE) ? src : s->base;
> >     >          if (backing_bs(target_bs) != backing) {
> >     >              bdrv_set_backing_hd(target_bs, backing, &local_err);
> >     >              if (local_err) {
> >     > @@ -771,7 +773,8 @@ static void coroutine_fn mirror_run(void
> *opaque)
> >     >      mirror_free_init(s);
> >     >
> >     >      s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> >     > -    if (!s->is_none_mode) {
> >     > +    if ((s->sync_mode != MIRROR_SYNC_MODE_INCREMENTAL) &&
> >     > +      (s->sync_mode != MIRROR_SYNC_MODE_NONE)) {
> >     >          ret = mirror_dirty_init(s);
> >     >          if (ret < 0 || block_job_is_cancelled(&s->common)) {
> >     >              goto immediate_exit;
> >     > @@ -1114,7 +1117,8 @@ static void mirror_start_job(const char
> *job_id,
> >     > BlockDriverState *bs,
> >
> >     Something appears to have corrupted your patch. Did you copy/paste
> this
> >     into gmail? I am unable to apply it.
> >
> >     Please use "git send-email" as detailed in the wiki contributors
> guide.
> >
> >
> > Okay, I'll try to fix these issues and send the patch again, ha
> >
> >
> >
> >     >                               BlockCompletionFunc *cb,
> >     >                               void *opaque,
> >     >                               const BlockJobDriver *driver,
> >     > -                             bool is_none_mode, BlockDriverState
> >     *base,
> >     > +                             MirrorSyncMode sync_mode, const char
> >     *bitmap,
> >     > +                             BlockDriverState *base,
> >     >                               bool auto_complete, const char
> >     > *filter_node_name,
> >     >                               Error **errp)
> >     >  {
> >     > @@ -1203,7 +1207,7 @@ static void mirror_start_job(const char
> *job_id,
> >     > BlockDriverState *bs,
> >     >      s->replaces = g_strdup(replaces);
> >     >      s->on_source_error = on_source_error;
> >     >      s->on_target_error = on_target_error;
> >     > -    s->is_none_mode = is_none_mode;
> >     > +    s->sync_mode = sync_mode;
> >     >      s->backing_mode = backing_mode;
> >     >      s->base = base;
> >     >      s->granularity = granularity;
> >     > @@ -1213,9 +1217,16 @@ static void mirror_start_job(const char
> >     *job_id,
> >     > BlockDriverState *bs,
> >     >          s->should_complete = true;
> >     >      }
> >     >
> >     > -    s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity,
> NULL,
> >     > errp);
> >     > -    if (!s->dirty_bitmap) {
> >     > -        goto fail;
> >     > +    if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
> >     > +        s->dirty_bitmap = bdrv_find_dirty_bitmap(bs, bitmap);
> >
> >     You're using a potentially undefined string here. You've also now
> split
> >     the error case for if we fail to start the coroutine -- one case has
> >     created a new bitmap, the other has merely picked up a handle to it.
> On
> >     failure in one case, we wish to free the bitmap -- in the other, we
> >     don't.
> >
> >     Whoa, actually, it looks like the code as-is doesn't free the bitmap
> on
> >     failure! that's a problem. I think there should be a call to
> >     release_dirty_bitmap somewhere here. The only one I can see is in
> >     mirror_run.
> >
> >     > +        if (!s->dirty_bitmap) {
> >     > +            goto fail;
> >
> >     You can't jump straight to the failure label without setting an error
> >     message. Prior instances in this function do not because functions
> they
> >     are calling have set errp for them. Please use err_setg.
> >
> >     > +        }
> >     > +    } else {
> >
> >     You're not checking to see if a bitmap was provided but the sync mode
> >     wasn't incremental, which we also need to validate.
> >
> >     > +        s->dirty_bitmap = bdrv_create_dirty_bitmap(bs,
> >     granularity, NULL,
> >     > errp);
> >     > +        if (!s->dirty_bitmap) {
> >     > +            goto fail;
> >     > +        }
> >     >      }
> >     >
> >     >      /* Required permissions are already taken with blk_new() */
> >     > @@ -1265,24 +1276,19 @@ fail:
> >     >  void mirror_start(const char *job_id, BlockDriverState *bs,
> >     >                    BlockDriverState *target, const char *replaces,
> >     >                    int64_t speed, uint32_t granularity, int64_t
> >     buf_size,
> >     > -                  MirrorSyncMode mode, BlockMirrorBackingMode
> >     backing_mode,
> >     > +                  MirrorSyncMode mode, const char *bitmap,
> >     > +                  BlockMirrorBackingMode backing_mode,
> >     >                    BlockdevOnError on_source_error,
> >     >                    BlockdevOnError on_target_error,
> >     >                    bool unmap, const char *filter_node_name, Error
> >     **errp)
> >     >  {
> >     > -    bool is_none_mode;
> >     >      BlockDriverState *base;
> >     >
> >     > -    if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
> >     > -        error_setg(errp, "Sync mode 'incremental' not supported");
> >     > -        return;
> >     > -    }
> >     > -    is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
> >     >      base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
> >     >      mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target,
> replaces,
> >     >                       speed, granularity, buf_size, backing_mode,
> >     >                       on_source_error, on_target_error, unmap,
> >     NULL, NULL,
> >     > -                     &mirror_job_driver, is_none_mode, base,
> false,
> >     > +                     &mirror_job_driver, mode, bitmap, base,
> false,
> >     >                       filter_node_name, errp);
> >     >  }
> >     >
> >     > @@ -1305,7 +1311,8 @@ void commit_active_start(const char *job_id,
> >     > BlockDriverState *bs,
> >     >      mirror_start_job(job_id, bs, creation_flags, base, NULL,
> >     speed, 0, 0,
> >     >                       MIRROR_LEAVE_BACKING_CHAIN,
> >     >                       on_error, on_error, true, cb, opaque,
> >     > -                     &commit_active_job_driver, false, base,
> >     auto_complete,
> >     > +                     &commit_active_job_driver,
> >     MIRROR_SYNC_MODE_FULL,
> >     > +                     NULL, base, auto_complete,
> >     >                       filter_node_name, &local_err);
> >     >      if (local_err) {
> >     >          error_propagate(errp, local_err);
> >     > diff --git a/blockdev.c b/blockdev.c
> >     > index 6428206..2569924 100644
> >     > --- a/blockdev.c
> >     > +++ b/blockdev.c
> >     > @@ -3379,6 +3379,7 @@ static void blockdev_mirror_common(const char
> >     > *job_id, BlockDriverState *bs,
> >     >                                     enum MirrorSyncMode sync,
> >     >                                     BlockMirrorBackingMode
> >     backing_mode,
> >     >                                     bool has_speed, int64_t speed,
> >     > +                                   bool has_bitmap, const char
> >     *bitmap,
> >     >                                     bool has_granularity, uint32_t
> >     > granularity,
> >     >                                     bool has_buf_size, int64_t
> >     buf_size,
> >     >                                     bool has_on_source_error,
> >     > @@ -3440,7 +3441,7 @@ static void blockdev_mirror_common(const char
> >     > *job_id, BlockDriverState *bs,
> >     >       */
> >     >      mirror_start(job_id, bs, target,
> >     >                   has_replaces ? replaces : NULL,
> >     > -                 speed, granularity, buf_size, sync, backing_mode,
> >     > +                 speed, granularity, buf_size, sync, bitmap,
> >     backing_mode,
> >
> >     You never validate the string passed in by the user before passing
> it on
> >     to internal routines; you don't check the value of "has_bitmap" at
> any
> >     point. You need to check has_bitmap and find that the user has
> provided
> >     a valid bitmap name before handing off to mirror_start.
> >
> >     Take a look at do_drive_backup for how this is handled there.
> >
> >     >                   on_source_error, on_target_error, unmap,
> >     filter_node_name,
> >     >                   errp);
> >     >  }
> >     > @@ -3575,6 +3576,7 @@ void qmp_drive_mirror(DriveMirror *arg,
> >     Error **errp)
> >     >      blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL,
> bs,
> >     > target_bs,
> >     >                             arg->has_replaces, arg->replaces,
> >     arg->sync,
> >     >                             backing_mode, arg->has_speed,
> arg->speed,
> >     > +                           arg->has_bitmap, arg->bitmap,
> >     >                             arg->has_granularity, arg->granularity,
> >     >                             arg->has_buf_size, arg->buf_size,
> >     >                             arg->has_on_source_error,
> >     arg->on_source_error,
> >     > @@ -3593,6 +3595,7 @@ void qmp_blockdev_mirror(bool has_job_id,
> >     const char
> >     > *job_id,
> >     >                           bool has_replaces, const char *replaces,
> >     >                           MirrorSyncMode sync,
> >     >                           bool has_speed, int64_t speed,
> >     > +                         bool has_bitmap, const char *bitmap,
> >     >                           bool has_granularity, uint32_t
> granularity,
> >     >                           bool has_buf_size, int64_t buf_size,
> >     >                           bool has_on_source_error,
> >     > @@ -3627,6 +3630,7 @@ void qmp_blockdev_mirror(bool has_job_id,
> >     const char
> >     > *job_id,
> >     >      blockdev_mirror_common(has_job_id ? job_id : NULL, bs,
> target_bs,
> >     >                             has_replaces, replaces, sync,
> >     backing_mode,
> >     >                             has_speed, speed,
> >     > +                           has_bitmap, bitmap,
> >     >                             has_granularity, granularity,
> >     >                             has_buf_size, buf_size,
> >     >                             has_on_source_error, on_source_error,
> >     > diff --git a/include/block/block_int.h b/include/block/block_int.h
> >     > index 4f8cd29..3c59d69 100644
> >     > --- a/include/block/block_int.h
> >     > +++ b/include/block/block_int.h
> >     > @@ -827,6 +827,7 @@ void commit_active_start(const char *job_id,
> >     > BlockDriverState *bs,
> >     >   * @granularity: The chosen granularity for the dirty bitmap.
> >     >   * @buf_size: The amount of data that can be in flight at one
> time.
> >     >   * @mode: Whether to collapse all images in the chain to the
> target.
> >     > + * @bitmap: The dirty bitmap if sync_mode is
> >     MIRROR_SYNC_MODE_INCREMENTAL.
> >
> >     This doesn't match what this field is in practice, which is a
> >     user-string from the QMP monitor, which is never checked to see if
> it is
> >     valid or not (or present or not.)
> >
> >     The documentation here reflects what it should be, but not what it
> is.
> >
> >     >   * @backing_mode: How to establish the target's backing chain
> after
> >     > completion.
> >     >   * @on_source_error: The action to take upon error reading from
> >     the source.
> >     >   * @on_target_error: The action to take upon error writing to the
> >     target.
> >     > @@ -844,7 +845,8 @@ void commit_active_start(const char *job_id,
> >     > BlockDriverState *bs,
> >     >  void mirror_start(const char *job_id, BlockDriverState *bs,
> >     >                    BlockDriverState *target, const char *replaces,
> >     >                    int64_t speed, uint32_t granularity, int64_t
> >     buf_size,
> >     > -                  MirrorSyncMode mode, BlockMirrorBackingMode
> >     backing_mode,
> >     > +                  MirrorSyncMode mode, const char *bitmap,
> >     > +                  BlockMirrorBackingMode backing_mode,
> >     >                    BlockdevOnError on_source_error,
> >     >                    BlockdevOnError on_target_error,
> >     >                    bool unmap, const char *filter_node_name, Error
> >     **errp);
> >     > diff --git a/qapi/block-core.json b/qapi/block-core.json
> >     > index 87fb747..8e3d9b2 100644
> >     > --- a/qapi/block-core.json
> >     > +++ b/qapi/block-core.json
> >     > @@ -1502,6 +1502,10 @@
> >     >  #
> >     >  # @speed:  the maximum speed, in bytes per second
> >     >  #
> >     > +# @bitmap: the name of dirty bitmap if sync is "incremental".
> >     > +#          Must be present if sync is "incremental", must NOT be
> >     present
> >     > +#          otherwise. (Since 2.10)
> >     > +#
> >     >  # @sync: what parts of the disk image should be copied to the
> >     destination
> >     >  #        (all the disk, only the sectors allocated in the topmost
> >     image, or
> >     >  #        only new I/O).
> >     > @@ -1533,7 +1537,7 @@
> >     >    'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
> >     >              '*format': 'str', '*node-name': 'str', '*replaces':
> >     'str',
> >     >              'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> >     > -            '*speed': 'int', '*granularity': 'uint32',
> >     > +            '*speed': 'int', '*bitmap': 'str', '*granularity':
> >     'uint32',
> >     >              '*buf-size': 'int', '*on-source-error':
> >     'BlockdevOnError',
> >     >              '*on-target-error': 'BlockdevOnError',
> >     >              '*unmap': 'bool' } }
> >     > @@ -1652,6 +1656,10 @@
> >     >  #
> >     >  # @speed:  the maximum speed, in bytes per second
> >     >  #
> >     > +# @bitmap: the name of dirty bitmap if sync is "incremental".
> >     > +#          Must be present if sync is "incremental", must NOT be
> >     present
> >     > +#          otherwise. (Since 2.10)
> >     > +#
> >     >  # @sync: what parts of the disk image should be copied to the
> >     destination
> >     >  #        (all the disk, only the sectors allocated in the topmost
> >     image, or
> >     >  #        only new I/O).
> >     > @@ -1694,7 +1702,7 @@
> >     >    'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
> >     >              '*replaces': 'str',
> >     >              'sync': 'MirrorSyncMode',
> >     > -            '*speed': 'int', '*granularity': 'uint32',
> >     > +            '*speed': 'int', '*bitmap': 'str', '*granularity':
> >     'uint32',
> >     >              '*buf-size': 'int', '*on-source-error':
> >     'BlockdevOnError',
> >     >              '*on-target-error': 'BlockdevOnError',
> >     >              '*filter-node-name': 'str' } }
> >     >
> >
> >     --
> >     —js
> >
> >
> > Thanks for the notes.
> >
> > D.
>
> Thanks for explaining the use-case, I'll take a look at the newer
> version shortly.
>
> --js
>



S pozdravom / Best regards
Daniel Kucera.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]