qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's descrip


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description
Date: Fri, 8 May 2015 10:34:10 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Stefan Hajnoczi (address@hidden) wrote:
> On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (address@hidden) wrote:
> > > On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote:
> > > > 
> > > > 
> > > > On 24/04/2015 11:38, Wen Congyang wrote:
> > > > >> > 
> > > > >> > That can be done with drive-mirror.  But I think it's too early 
> > > > >> > for that.
> > > > > Do you mean use drive-mirror instead of quorum?
> > > > 
> > > > Only before starting up a new secondary.  Basically you do a migration
> > > > with non-shared storage, and then start the secondary in colo mode.
> > > > 
> > > > But it's only for the failover case.  Quorum (or a new block/colo.c
> > > > driver or filter) is fine for normal colo operation.
> > > 
> > > Perhaps this patch series should mirror the Secondary's disk to a Backup
> > > Secondary so that the system can be protected very quickly after
> > > failover.
> > > 
> > > I think anyone serious about fault tolerance would deploy a Backup
> > > Secondary, otherwise the system cannot survive two failures unless a
> > > human administrator is lucky/fast enough to set up a new Secondary.
> > 
> > I'd assumed that a higher level management layer would do the allocation
> > of a new secondary after the first failover, so no human need be involved.
> 
> That doesn't help, after the first failover is too late even if it's
> done by a program.  There should be no window during which the VM is
> unprotected.
>
> People who want fault tolerance care about 9s of availability.  The VM
> must be protected on the new Primary as soon as the failover occurs,
> otherwise this isn't a serious fault tolerance solution.

I'm not aware of any other system that manages that, so I don't
think that's fair.

You gain a lot more availability going from a single
system to the 1+1 system that COLO (or any of the checkpointing systems)
propose, I can't say how many 9s it gets you.  It's true having multiple
secondaries would get you a bit more on top of that, but you're still
a lot better off just having the one secondary.

I had thought that having >1 secondary would be a nice addition, but it's
a big change everywhere else (e.g. having to maintain multiple migration
streams, dealing with miscompares from multiple hosts).

Dave


> 
> Stefan


--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]