qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
Date: Fri, 26 Jun 2015 20:03:36 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Wen Congyang (address@hidden) wrote:
> On 06/24/2015 10:07 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (address@hidden) wrote:
> >> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
> >>> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
> >>>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> >>>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
> >>>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
> >>>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
> >>>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
> >>>>>>>> +{
> >>>>>>>> +    BlockDriver *drv = bs->drv;
> >>>>>>>> +
> >>>>>>>> +    if (drv && drv->bdrv_connect) {
> >>>>>>>> +        drv->bdrv_connect(bs, errp);
> >>>>>>>> +    } else if (bs->file) {
> >>>>>>>> +        bdrv_connect(bs->file, errp);
> >>>>>>>> +    } else {
> >>>>>>>> +        error_setg(errp, "this feature or command is not currently 
> >>>>>>>> supported");
> >>>>>>>> +    }
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
> >>>>>>>> +{
> >>>>>>>> +    BlockDriver *drv = bs->drv;
> >>>>>>>> +
> >>>>>>>> +    if (drv && drv->bdrv_disconnect) {
> >>>>>>>> +        drv->bdrv_disconnect(bs);
> >>>>>>>> +    } else if (bs->file) {
> >>>>>>>> +        bdrv_disconnect(bs->file);
> >>>>>>>> +    }
> >>>>>>>> +}
> >>>>>>>
> >>>>>>> Please add doc comments describing the semantics of these commands.
> >>>>>>
> >>>>>> Where should it be documented? In the header file?
> >>>>>
> >>>>> block.h doesn't document prototypes in the header file, please document
> >>>>> the function definition in block.c.  (QEMU is not consistent here, some
> >>>>> places do it the other way around.)
> >>>>>
> >>>>>>> Why are these operations needed when there is already a bs->drv == 
> >>>>>>> NULL
> >>>>>>> case which means the BDS is not ready for read/write?
> >>>>>>>
> >>>>>>
> >>>>>> The purpos is that: don't connect to nbd server when opening a nbd 
> >>>>>> client.
> >>>>>> connect/disconnect
> >>>>>> to nbd server when we need to do it.
> >>>>>>
> >>>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
> >>>>>> connect/disconnect
> >>>>>> means that connect/disconnect to remote target(The target may be in 
> >>>>>> another
> >>>>>> host).
> >>>>>
> >>>>> Connect/disconnect puts something on the QEMU command-line that isn't
> >>>>> ready at startup time.
> >>>>>
> >>>>> How about using monitor commands to add objects when needed instead?
> >>>>>
> >>>>> That is cleaner because it doesn't introduce a new state (which is only
> >>>>> implemented for nbd).
> >>>>>
> >>>>
> >>>> The problem is that, nbd client is one child of quorum, and quorum must 
> >>>> have more
> >>>> than one child. The nbd server is not ready until colo is running.
> >>>
> >>> A monitor command to hot add/remove quorum children solves this problem
> >>> and could also be used in other scenarios (e.g. user decides to take a
> >>> quorum child offline).
> >>>
> >>
> >> For replication case, we always do checkpoint again and again after
> >> migration. If the disk is not synced before migration, we will use disk 
> >> mirgation or mirror
> >> job to sync it.
> > 
> > Can you document the way that you use disk migration or mirror, together
> > with your COLO setup?   I think it would make it easier to understand this
> > restriction.
> > At the moment I don't understand how you can switch from doing a disk 
> > migration
> > into COLO mode - there seems to be a gap between the end of disk migration 
> > and the start
> > of COLO.
> 
> In my test, I use disk migration.
> 
> After migration and before starting COLO, we will disable disk migration:
> +    /* Disable block migration */
> +    s->params.blk = 0;
> +    s->params.shared = 0;
> +    qemu_savevm_state_begin(trans, &s->params);

Ah, I hadn't realised you could do that; so do you just do:

migrate_set_parameter colo on
migrate -d -b tcp:otherhhost:port

How does the secondary know to feed that data straight into the disk without
recording all the old data into the hidden-disk ?

> If the user uses mirror job, we don't cancel the mirror job now.

It would be good to get it to work with mirror, that seems preferred these
days to the old block migration.

With block migration or mirror, then that pretty much allows you to replace
a failed secondary doesn't it without restarting?  The only thing stopping
you is the NBD client needs to point to the address of the new secondary.
Stefan's suggestion of being able to modify the quorum on the fly would seem
to fix that problem.
(The other thing I can see is that the secondary wouldn't have the conntrack
data for open connections; but that's a separate problem).

A related topic was that a colleague was asking about starting the qemu
up and only then setting the NBD target (he's trying to wire it into OpenStack);
I suggested that instead of specifying the drive on the command line, it
might work to use a drive_add to add the whole quorum/drive stack before 
starting
the guest; however again something to modify the quorum would be easier.

Finally the last good reason I can see for being able to modify the quorum on 
the fly
is that you'll need it for continuous ft to be able to transmute a secondary 
into
a new primary.

Dave

> 
> > 
> >> So we cannot start block replication when migration is running. We need
> >> that nbd
> >> client is not ready when migration is running, and it is ready between
> >> migration ends
> >> and checkpoint begins. Using a monotir command add the nbd client will 
> >> cause
> >> larger
> >> downtime. So if the nbd client has been added(only not connect to the nbd
> >> server),
> >> we can connect to nbd server automatically.
> > 
> > Without the disk migration or mirror, I can't see the need for the delayed 
> > bdrv_connect.
> > I can see that you want to do a disconnect at failover, however you need
> > to take care because if the network is broken at the point of failover
> > you need to make sure nothing blocks.
> 
> disk migration is needed if the disk is not synced before migration.
> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> >>
> >> Thanks
> >> Wen Congyang
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]