qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/12] Dirty bitmaps migration


From: John Snow
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/12] Dirty bitmaps migration
Date: Tue, 26 Jan 2016 17:57:45 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


On 01/26/2016 03:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 03.06.2015 01:17, John Snow wrote:
>>
>> On 05/28/2015 04:56 PM, Denis V. Lunev wrote:
>>> On 28/05/15 23:09, John Snow wrote:
>>>> On 05/26/2015 10:51 AM, Denis V. Lunev wrote:
>>>>> On 26/05/15 17:48, Denis V. Lunev wrote:
>>>>>> On 21/05/15 19:44, John Snow wrote:
>>>>>>> On 05/21/2015 09:57 AM, Denis V. Lunev wrote:
>>>>>>>> On 21/05/15 16:51, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> Hi all.
>>>>>>>>>
>>>>>>>>> Hmm. There is an interesting suggestion from Denis Lunev (in CC)
>>>>>>>>> about
>>>>>>>>> how to drop meta bitmaps and make things easer.
>>>>>>>>>
>>>>>>>>> method:
>>>>>>>>>
>>>>>>>>>> start migration
>>>>>>>>> disk and memory are migrated, but not dirty bitmaps.
>>>>>>>>>> stop vm
>>>>>>>>> create all necessary bitmaps in destination vm (empty, but with
>>>>>>>>> same
>>>>>>>>> names and granularities and enabled flag)
>>>>>>>>>> start destination vm
>>>>>>>>> empty bitmaps are tracking now
>>>>>>>>>> start migrating dirty bitmaps. merge them to corresponding
>>>>>>>>>> bitmaps
>>>>>>>>> in destination
>>>>>>>>> while bitmaps are migrating, they should be in some kind of
>>>>>>>>> 'inconsistent' state.
>>>>>>>>> so, we can't start backup or other migration while bitmaps are
>>>>>>>>> migrating, but vm is already _running_ on destination.
>>>>>>>>>
>>>>>>>>> what do you think about it?
>>>>>>>>>
>>>>>>>> the description is a bit incorrect
>>>>>>>>
>>>>>>>> - start migration process, perform memory and disk migration
>>>>>>>>       as usual. VM is still executed at source
>>>>>>>> - start VM on target. VM on source should be on pause as usual,
>>>>>>>>       do not finish migration process. Running VM on target
>>>>>>>> "writes"
>>>>>>>>       normally setting dirty bits as usual
>>>>>>>> - copy active dirty bitmaps from source to target. This is safe
>>>>>>>>       as VM on source is not running
>>>>>>>> - "OR" copied bitmaps with ones running on target
>>>>>>>> - finish migration process (stop source VM).
>>>>>>>>
>>>>>>>> Downtime will not be increased due to dirty bitmaps with this
>>>>>>>> approach, migration process is very simple - plain data copy.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>        Den
>>>>>>>>
>>>>>>> I was actually just discussing the live migration approach a little
>>>>>>> bit
>>>>>>> ago with Stefan, trying to decide on the "right" packet format (The
>>>>>>> only
>>>>>>> two patches I haven't ACKed yet are ones in which we need to
>>>>>>> choose a
>>>>>>> send size) and we decided that 1KiB chunk sends would be
>>>>>>> appropriate for
>>>>>>> live migration.
>>>>>>>
>>>>>>> I think I'm okay with that method, but obviously this approach
>>>>>>> outlined
>>>>>>> here would also work very well and would avoid meta bitmaps, chunk
>>>>>>> sizes, migration tuning, convergence questions, etc etc etc.
>>>>>>>
>>>>>>> You'd need to add a new status to the bitmap on the target (maybe
>>>>>>> "INCOMPLETE" or "MIGRATING") that prevents it from being used for a
>>>>>>> backup operation without preventing it from recording new writes.
>>>>>>>
>>>>>>> My only concern is how easy it will be to work this into the
>>>>>>> migration
>>>>>>> workflow.
>>>>>>>
>>>>>>> It would require some sort of "post-migration" ternary phase, I
>>>>>>> suppose,
>>>>>>> for devices/data that can be transferred after the VM starts --
>>>>>>> and I
>>>>>>> suspect we'll be the only use of that phase for now.
>>>>>>>
>>>>>>> David, what are your thoughts, here? Would you prefer Vladimir and I
>>>>>>> push forward on the live migration approach, or add a new post-hoc
>>>>>>> phase? This approach might be simpler on the block layer, but I
>>>>>>> would be
>>>>>>> rather upset if he scrapped his entire series for the second time
>>>>>>> for
>>>>>>> another approach that also didn't get accepted.
>>>>>>>
>>>>>>> --js
>>>>>> hmmm.... It looks like we should proceed with this to fit 2.4 dates.
>>>>>> There is not much interest at the moment. I think that we could
>>>>>> implement this later in 2.5 etc...
>>>>>>
>>>>>> Regards,
>>>>>>       Den
>>>>> oops. I have written something strange. Anyway, I think that for
>>>>> now we should proceed with this patchset to fit QEMU 2.4 dates.
>>>>> The implementation with additional stage (my proposal) could be
>>>>> added later, f.e. in 2.5 as I do not see much interest from migration
>>>>> gurus.
>>>>>
>>>>> In this case the review will take a ... lot of time.
>>>>>
>>>>> Regards,
>>>>>       Den
>>>>>
>>>> That sounds good to me. I think this solution is workable for 2.4, and
>>>> we can begin working on a post-migration phase for the future to help
>>>> simplify our cases a lot.
>>>>
>>>> I have been out sick much of this week, so apologies in my lack of
>>>> fervor getting this series upstream recently.
>>>>
>>>> --js
>>> no prob :)
>> Had a chat with Stefan about this approach and apparently that's what
>> the postcopy migration patches on-list are all about.
>>
>> Stefan brought up the point of post-hoc reliability: It's possible to
>> transfer control to the new VM and then lose your link, making migration
>> completion impossible. Adding a post-copy phase to our existing live
>> migration is a non-starter, because it introduces unfairly this
>> unreliability to the existing system.
>>
>> However, we can make this idea work for migrations started via the
>> post-copy mechanism, because the entire migration already carries that
>> known risk of completion failure.
>>
>> It seems like the likely outcome though is that migrations will be able
>> to be completed with either mechanism in the future: either up-front
>> migration or post-copy migration. In that light, it seems we won't be
>> able to fully rid ourselves of the meta_bitmap idea, making the
>> post-copy idea here not too useful in culling our complexity, since
>> we'll have to support the current standard live migration anyway.
>>
>> So I have reviewed the current set of patches under the assumption that
>> it seems like the right way to go for 2.4 and beyond.
>>
>> Thank you!
>> --js
> 
> For now, post-copy migration is merged as I know. Is something changed
> for its reliability? Do we still need meta-bitmap approach for bitmap
> migration?
> 

[Dropping a few people from the CC list, adding qemu-block]

There will always be the issue of post-hoc reliability, and I believe
for now all migrations still default to the non-postcopy version.

Still, losing a bitmap is not as catastrophic as losing ram, so maybe
it'd be OK to introduce bitmap migration as postcopy-only, allowing you
to ditch the meta bitmaps.

I think whether or not you need the meta bitmap hinges on convincing
migration maintainers that there is no need to ever do a live migration
of bitmap data, but that postcopying it is always preferred, enough so
that we never bother to merge the live migration version.

I imagine the post-copy bitmap migration looks something like this:

1) Migrate the metadata for the bitmap (name, size, granularity, etc)

2) Create this bitmap on the target, and immediately create an anonymous
child to move it into the frozen state. The anonymous child is marked
read-only for now to prevent block migration writes from corrupting the
bitmap. Neither bitmaps can be written to or used for operations.

3) At the time of migration pivot, the source bitmap is marked as
read-only, and the destination bitmap's anonymous child is marked as
read-write. Any disk IO that happens in this period is recorded in the
anonymous child.

4) The source bitmap is migrated using a simple for loop at about ~1KiB
at a time (to fit in Ethernet frames. Perhaps this can be configured if
we really, really want to.)

5) The destination stores the migrated bitmap in the frozen parent object.

6) When migration is fully complete, the parent and child bitmap object
can be merged into one and moved back into a normal operative state.


as for the reliability issue, we have a lot of mitigation options...

If the source machine loses connection, we can easily just re-start the
data transfer, since the bitmap on the target is still recording new
information.

Perhaps the source VM could attempt to write out the bitmap data to disk
automatically once it loses connection, and delete that data if it
manages to re-connect and transmit successfully. If that's not desired,
we can always add HMP/QMP commands to do a state dump.

If we do lose the network entirely, we'll need a matching QMP/HMP
command to load state from disk transferred via other means, then the
target bitmaps can resume normal operation once they get that missing state.

Obviously nothing that can be done for if the source just flat-out
crashes. We could attempt to dump all bitmaps to disk in the event of a
failure, but if things are that unstable that we crashed *after* pivot,
recovery doesn't sound likely.

The target will just simply have to be re-synced with a new full backup.

... actually, wait...

Can we implement a function that, in the event of a disaster, compares
the current state of the drive with the last known good incremental and
populates a bitmap based on the difference?

Actually, we probably really want this feature around regardless. It'll
allow us to start incremental backup chains from full backups made
before we even knew we wanted to start making them.

Something like:

block-dirty-bitmap-diff node=drive0 name=bitmap0 target=/path/to/file

I like this idea, I think I'll prototype just this little piece, since
it's useful even without postcopy bitmap migration.

Thanks!
--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]