qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] COLO HA Project proposal


From: Dong, Eddie
Subject: Re: [Qemu-devel] [RFC] COLO HA Project proposal
Date: Fri, 4 Jul 2014 08:54:55 +0000

> >
> > Let me clarify on this issue. COLO didn't ignore the TCP sequence
> > number, but uses a new implementation to make the sequence number to
> > be best effort identical between the primary VM (PVM) and secondary VM
> > (SVM). Likely, VMM has to synchronize the emulation of randomization
> > number generation mechanism between the PVM and SVM, like the
> lock-stepping mechanism does.
> >
> > Further mnore, for long TCP connection, we can rely on the (on-demand)
> > VM checkpoint to get the identical Sequence number both in PVM and
> SVM.
> 
> That wasn't really my question; I was worrying about other forms of
> randomness, such as winners of lock contention, and other SMP
> non-determinisms, and I'm also worried by what proportion of time the
> system can't recover from a failure due to being unable to distinguish an
> SVM failure from a randomness issue.
> 
Thanks Dave:
        Whether the randomness value/branch/code path the PVM and SVM may have,
It is only a performance issue. COLO never assumes the PVM and SVM has same 
internal
Machine state.  From correctness p.o.v, as if the PVM and SVM generate
Identical response, we can view the SVM is a valid replica of PVM, and the SVM 
can take over
When the PVM suffers from hardware failure. We can view the client is all the 
way talking with 
the SVM, without the notion of PVM.  Of course, if the SVM dies, we can 
regenerate a copy
of PVM with a new checkpoint too.
        The SOCC paper has the detail recovery model :)

Thanks, Eddie






reply via email to

[Prev in Thread] Current Thread [Next in Thread]