qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] COLO HA Project proposal


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [RFC] COLO HA Project proposal
Date: Fri, 4 Jul 2014 09:35:46 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Dong, Eddie (address@hidden) wrote:
> > >
> > > I didn't quite understand a couple of things though, perhaps you can
> > > explain:
> > >    1) If we ignore the TCP sequence number problem, in an SMP machine
> > > don't we get other randomnesses - e.g. which core completes something
> > > first, or who wins a lock contention, so the output stream might not
> > > be identical - so do those normal bits of randomness cause the
> > > machines to flag as out-of-sync?
> > 
> > It's about COLO agent, CCing Congyang, he can give the detailed
> > explanation.
> > 
> 
> Let me clarify on this issue. COLO didn't ignore the TCP sequence number, but 
> uses a 
> new implementation to make the sequence number to be best effort identical 
> between the primary VM (PVM) and secondary VM (SVM). Likely, VMM has to 
> synchronize 
> the emulation of randomization number generation mechanism between the 
> PVM and SVM, like the lock-stepping mechanism does. 
> 
> Further mnore, for long TCP connection, we can rely on the (on-demand) VM 
> checkpoint to get the 
> identical Sequence number both in PVM and SVM. 

That wasn't really my question; I was worrying about other forms of randomness,
such as winners of lock contention, and other SMP non-determinisms,
and I'm also worried by what proportion of time the system can't recover
from a failure due to being unable to distinguish an SVM failure from
a randomness issue.

Dave

> 
> 
> Thanks, Eddie
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]