qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Migration design planning


From: John Snow
Subject: Re: [Qemu-devel] Migration design planning
Date: Tue, 1 Mar 2016 13:54:24 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0


On 03/01/2016 08:47 AM, Juan Quintela wrote:
> John Snow <address@hidden> wrote:
>> Hi Juan;
>> We need your assistance in reviewing two competing designs for migrating
>> some block data so we can move forward with the feature.
>>
>> First, some background:
>>
>> What: Block Dirty Bitmaps. They are simple primitives that keep track of
>> which clusters have been written to since the last incremental backup.
>>
>> Why: They are in-ram primitives that do not get migrated as-is alongside
>> block data, they need to be migrated specially. We want to migrate them
>> so that the "incremental backup" feature is available after a migration.
>>
>> How: There are two competing designs, see below.
>>
>>
>> Design Option #1: Live Migration
>>
>> Just like block data and ram, we make an initial pass over the data and
>> then continue to re-transmit data as necessary when block data becomes
>> dirtied again.
>>
>> This is a simple, bog-standard approach that mimics pretty closely how
>> other systems are migrated.
>>
>> The series is here from November:
>> https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02717.html
>>
>> Most of the block-specific stuff has been reviewed, but it never got any
>> reviews by the migration maintainers. It's reasonably rotted by this
>> point, but it probably would not be a herculean effort to revive it.
> 
> After this week I will take a look at this series.
> 

Gracias :)

>>
>> Design Option #2: "Postcopy" Migration
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg02793.html
>>
>> The concept here is that incremental backup data can be treated simply
>> as best-effort; if it is lost, it's not a big deal. We can reconstitute
>> the data or simply start a new incremental backup sync point with a full
>> backup.
>>
>> The idea then is that instead of the incremental live migration, we just
>> wait to copy the bitmap until after the pivot and send it all at once.
>> This is faster and a bit more efficient, and will scale pretty nicely to
>> even quite large bitmaps.
> 
> How big is it?
> And what is a normal rate of dirtying of that bitmap?
> 

Dirty Bitmap is by default 1-bit-per-64kb of source data.
For a 2TiB disk image, it's 4MiB of bitmap data.
Pretty small in most cases.

The rate of being dirtied, percentage-wise, will match the existing rate
you see for block devices. It can be very low if there's not a lot of
disk activity, or fairly high if a lot of activity is going on while
we're trying to pivot.

> 
>> What I'd like from you: a broad acknowledgment of whether or not you
>> feel the Postcopy solution here is tenable, so we know which solution to
>> pursue. If we can get an ACK to one or the other method, we can
>> exhaustively review it from our end before handing it back to you for a
>> comprehensive migration review. We would like to see this feature hit
>> 2.6 if possible as the designs have been on-list for quite some time.
> 
> To make a good evaluation, we need:
> - how big are normally that bitmaps
> - what is a typical/worst dirty rate
> 
> I guess we can go from them.
> 
> And you say that you don't care a lot about lossing the bitmap.  "Not
> big deal" here means?
> 

If the bitmap data is lost, it's only metadata. It *can* be
reconstructed, albeit very slowly. It's not mission critical like RAM
is. We want to do our best to preserve this data, but if it's lost ...
it's not a fatal error. It's just very inconvenient. This distinguishes
it from RAM Postcopy.

> Size here is also important, normally we have around 1-2MB of data for
> the last stage.  If size is much bigger than that amount, we will
> really, really want it to be send "live".
> 

Right, it's not suitable to send it in one chunk during the pivot in a
blocking way. The "postcopy" approach described here refers to
asynchronously transferring the data post-pivot.

a 2TiB disk may have a 4MiB bitmap. More than we can send during the
pivot. Either we need to send it incrementally during the live phase, or
send it asynchronously post-pivot (aka postcopy.)

> 
> Wondering about the second approach, it is possible for you:
> 
> - do normal migration
> - run on destination with a new empty bitmap
> - transfer the bitmap now
> - xor the two bitmaps
> 

Yes indeed!

> Or this is exactly what you are proposing on the second example?  If it
> is, what is the error recovery if we lost connection during the transfer of 
> the
> bitmap, can you recover (I guess this is the "not big deal") part.
> 

Yes, that's exactly the idea. We don't even need to XOR the bitmaps,
simple BITOR works fine. A temporary bitmap can record necessary data
for us until the "old" bitmap arrives and they can be merged. We have
all the primitives we need to do this.

The idea is that the bitmap will be "locked" and prohibited from being
used in any backup operations until it "converges." If the socket dies
before we can get all data, we can mark the bitmap as read-only or
unusable pending user intervention.

We have no error recovery plan per-se, though I am working on a function
to "rebuild" a bitmap as a diff between two images which serves as
disaster recovery. Alternatively a user can just make a new full backup,
creating a new bitmap in the process to record the write differential
from that point forward.

It'll be a sad day if we lose a 4MiB bitmap for a 2TiB image, but it's
not unrecoverable.

(Heck, even if we get a partial bitmap -- it's not inconceivable to only
perform the differential on the MISSING portion to reconstruct what we
didn't get.)

> Does this makes any sense?
> 

Absolutely!

> Later, Juan.
> 
> PD.  It is clear by now that I don't understand how you do the backup!
> 

I think you understand perfectly well ;)

--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]