qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistenc


From: Michael R. Hines
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Tue, 12 Aug 2014 04:15:59 +0800
User-agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Excellent question: QEMU does have a feature called "drive-mirror"
in block/mirror.c that was introduced a couple of years ago. I'm not sure what the
adoption rate of the feature is, but I would start with that one.

There is also a second fault tolerance implementation that works a little differently called
"COLO" - you may have seen those emails on the list too, but their
method does not require a disk replication solution, if I recall correctly.

I know the time pressure that comes during a thesis, though =), so
there's no pressure to work on it - but that is the most pressing issue
in the implementation today. (Lack of disk replication in micro-checkpointing.)

The MC implementation also needs to be re-based against the latest
master - I just haven't had a chance to do it yet because some of my
hardware has been taken away from me the last few months - will
see if I can find some reasonable hardware soon.

- Michael

On 08/12/2014 01:22 AM, Walid Nouri wrote:
Hi,
I will do my best to make a contribution :-)

Are there alternative ways of replicating local storage other than DRBD that are possibly feasible?
Some that are directly build into Qemu?

Walid

Am 09.08.2014 14:25, schrieb Michael R. Hines:
On Sat, 2014-08-09 at 14:08 +0200, Walid Nouri wrote:
Hi Michael,
how is the weather in Bejing? :-)
It's terrible. Lots of pollution =(

May I ask you some questions to your MC implementation?

Currently i'm trying  to understand the general working of the MC
protokoll and possible problems that can occur so that I can discuss it
in my thesis.

As far as i have understand MC relies on a shared disk. Output of the
primary vm are directly written, network output is buffered until the
corresponding checkpoint is acknowledged.

One problem that comes into my mind is: What happens when the primary vm writes to the disk and crashes before sending a corresponding checkpoint?

The MC implementation itself is incomplete, today. (I need help).

The Xen Remus implementation uses the DRBD system to "mirror" all disk
writes to the source and destination before completing each checkpoint.

The KVM (mc) implementation needs exactly the same support, but it is
missing today.

Until that happens, we are *required* to use root-over-iSCSI or
root-over-NFS (meaning that the guest filesystem is mounted directly
inside the virtual machine without the host knowing about it.

This has the effect of translating all disk I/O into network I/O,
and since network I/O is already buffered, then we are safe.


Here an example: The Primary state is in the actual epoch epoch (n),
secondary state is in epoch (n-1). The primary writes to disk and
crashes before or while sending the checkpoint n. In this case the
secondary memory state is still at epoch (n-1) and the state of the
shared Disk corresponds to the primary state of epoch (n).

How does MC guaranty that the Disk state of the backup vm is consistent
with its Memory state?
As I mentioned above, we need the equivalent of the Xen solution, but I
just haven't had the time to write it (or incorporate someone else's
implementation). Patch is welcome =)

Is Memory-VCPU / Disk State consistency necessary under all circumstances?
Or can this be neglected because the secondary will (after a fail over)
repeat the same instructions and finally write to disk the same (as the
primary before) data for a second time?
Could this lead to fatal inconsistencies?

Walid




- Michael









reply via email to

[Prev in Thread] Current Thread [Next in Thread]