qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v13 00/17] block: incremental backup series


From: John Snow
Subject: Re: [Qemu-devel] [PATCH v13 00/17] block: incremental backup series
Date: Fri, 20 Feb 2015 12:20:53 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0



On 02/20/2015 06:09 AM, Stefan Hajnoczi wrote:
On Fri, Feb 13, 2015 at 05:08:41PM -0500, John Snow wrote:
This series requires: [PATCH v3] blkdebug: fix "once" rule

Welcome to the "incremental backup" newsletter, where we discuss
exciting developments in non-redundant backup technology.
This issue is called the "Max Reitz Fixup" issue.

This patchset enables the in-memory part of the incremental backup
feature. There are two series on the mailing list now by Vladimir
Sementsov-Ogievskiy that enable the migration and persistence of
dirty bitmaps.

This series and most patches were originally by Fam Zheng.

Please add docs/incremental-backup.txt explaining how the commands are
intended to be used to perform backups.  The QAPI documentation is
sparse and QMP clients would probably need to read the code to
understand how these commands work.


OK. I wrote a markdown formatted file that explains it all pretty well; should I check it in as-is, or should I convert it to some other format?

https://github.com/jnsnow/qemu/blob/bitmap-demo/docs/bitmaps.md

I'm not sure I understand the need for all the commands: add, remove,
enable, disable, clear.  Why are these commands necessary?

add/remove are self explanatory.

clear allows you to re-sync a bitmap to a full backup after you have already been running for some time. See the readme for the usage case. Yes, you *COULD* delete and re-create the bitmap with add/remove, but why?
Clear is nice, I stand by it.

enable/disable: Not necessarily useful for the simple case at this exact moment; they can be used to track writes during a period of time if desired. You could use them with transactions to record activity through different periods of time. They were originally used for what the "frozen" case covers now, but I left them in.

I could stand to part with them if you think they detract more than help. They seemed like useful primitives. My default action will be to leave them in, still.


I think just add and remove should be enough:

Initial full backup
-------------------
Use transaction to add bitmaps to all drives atomically (i.e.
point-in-time snapshot across all drives) and launch drive-backup (full)
on all drives.

In your patch the bitmap starts disabled.  In my example bitmaps are
always enabled unless they have a successor.

No they don't:

    bitmap->disabled = false;


Incremental backup
------------------
Use transaction to launch drive-backup (bitmap mode) on all drives.

If backup completes successfully we'll be able to run the same command
again for the next incremental backup.

If backup fails I see a problem when taking consistent incremental
backups across multiple drives:

Imagine 2 drives (A and B).  Transaction is used to launch drive-backup
on both with their respective dirty bitmaps.  drive-backup A fails and
merges successor dirty bits so nothing is lost, but what do we do with
drive-backup B?

drive-backup B could be in-progress or it could have completed before
drive-backup A failed.  Now we're in trouble because we need to take
consistent snapshots across all drives but we've thrown away the dirty
bitmap for drive B!

Just re-run the transaction. If one failed but one succeeded, just run it again. The one that succeeded prior will now have a trivial incremental backup, and the one that failed will have a new valid incremental backup. The two new incrementals will be point in time consistent.

E.G.:

[full_a] <-- [incremental_a_0: FAILED]
[full_b] <-- [incremental_b_0: SUCCESS]

then re-run:

[full_a] <------------------------ [incremental_a_1]
[full_b] <-- [incremental_b_0] <-- [incremental_b_1]

If the extra image in the chain for the one that succeeded is problematic, you can use other tools to consolidate them later.

Either way, how do you propose getting a consistent snapshot across multiple drives after a failure? The only recovery option is to just create a new snapshot *later*, you can never go back to what was, just like you can't for single drives.

libvirt can tell which portions of the transaction ultimately succeeded by the BLOCK_JOB_COMPLETED events that it receives back, one per each drive.

For drives that complete successfully, it can make a mental note that it has a new proper incremental. For drives that do not, it can make a note that it needs to try for a new consistent drives-wide incremental, erasing the half-baked attempt that got created last time.

I am not convinced there is a problem. Since we don't want filename management (&c) as part of this solution inside of QEMU, there is nothing left to do except within libvirt.


Stopping incremental backup
---------------------------
Use transaction to remove bitmaps on all drives.  This case is easy.

Finally, what about bdrv_truncate()?  When the disk size changes the
dirty bitmaps keep the old size.  Seems likely to cause problems.  Have
you thought about this case?


Only just recently when reviewing Vladimir's bitmap persistence patches, actually.

Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]