qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()
Date: Thu, 17 Mar 2016 09:59:48 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

* Wen Congyang (address@hidden) wrote:
> On 03/17/2016 05:48 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (address@hidden) wrote:
> >> On 03/17/2016 05:10 PM, Alberto Garcia wrote:
> >>> On Thu 17 Mar 2016 02:22:40 AM CET, Wen Congyang <address@hidden> wrote:
> >>>>>>>> @@ -81,6 +82,8 @@ typedef struct BDRVQuorumState {
> >>>>>>>>       bool rewrite_corrupted;/* true if the driver must 
> >>>>>>>> rewrite-on-read corrupted
> >>>>>>>>                               * block if Quorum is reached.
> >>>>>>>>                               */
> >>>>>>>> +    unsigned long *index_bitmap;
> >>>>>>
> >>>>>> Hi Berto
> >>>>>>
> >>>>>> *NOTE*, In the old version, we just used "bs->node_name", but in the
> >>>>>> lastest one, as Kevin suggested we introduce
> >>>>>> "child->child_name"(formart as "children.xxx"), this is the key cause
> >>>>>> why we need this two functions here.
> >>>>>
> >>>>> I'm sorry I missed this discussion earlier. Your code seems technically
> >>>>> correct but I have several questions:
> >>>>>
> >>>>> - I read that one of the reasons for this change is that "In theory, the
> >>>>>   same node could be attached twice to the same parent in different
> >>>>>   roles.". Is there any example of that? What's the use case?
> >>>>
> >>>> Kevin may know the case.
> >>>
> >>> Kevin, do you have an example?
> >>>
> >>>>> - How do you obtain the child name?
> >>>>
> >>>> IIRC, the answer is no now. I think we can improve 'info block' output
> >>>
> >>> Okay, but then we should extend that first, otherwise this API cannot be
> >>> used.
> >>>
> >>>>> - I see that if you have children.0 and children.1 (let's say hd0.qcow2
> >>>>>   and hd1.qcow2), then you remove children.0 and add it again, it will
> >>>>>   keep the 'children.0' name (that's what the bitmap is for if I'm
> >>>>>   understanding it correctly). However the position in the s->children
> >>>>>   array will change because you do memmove() when you remove children.0
> >>>>>   and then add it again to the end of the array.
> >>>>>
> >>>>>   Initial status:
> >>>>>
> >>>>>     s->children[0] <--> "children.0" (hd0.qcow2)
> >>>>>     s->children[1] <--> "children.1" (hd1.qcow2)
> >>>>>
> >>>>>   children.0 (hd0.qcow2) is removed:
> >>>>>
> >>>>>     s->children[0] <--> "children.1" (hd1.qcow2)
> >>>>>
> >>>>>   children.0 (hd0.qcow2) is added again:
> >>>>>
> >>>>>     s->children[0] <--> "children.1" (hd1.qcow2)
> >>>>>     s->children[1] <--> "children.0" (hd0.qcow2)
> >>>>
> >>>> Yes, it is correct.
> >>>>
> >>>>>
> >>>>>   Is this correct? Is this the indented behavior? Since you are reading
> >>>>>   in FIFO mode, now hd1.qcow2 will always be read first, so if
> >>>>>   children.1 was the secondary disk, it has just become the primary.
> >>>>
> >>>> Yes.
> >>>
> >>> And don't you need a way to control the order in which the disks must be
> >>> read for COLO?
> >>
> >> I think in fifo mode, we should read the disk first that is added earlier.
> >>
> >> We don't need a way to control the order now.
> > 
> > Can you document fully how it's used in COLO then?
> 
> Do you mean document it in docs/block-replication.txt?

That would be OK.

> > We should have the failure modes documented, and how you'll use
> > it after failover etc   Without that it's really difficult to tell
> > if this naming is right.
> 
> For COLO, children.0 is the real disk, children.1 is replication driver.
> After failure, children.1 will be removed by the user. If we want to
> continue do COLO, we need add a new children.1 again.

So you need to document how to do that.

> > The children.0 notation is really confusing in the way that Berto
> > describes; I hit this a couple of months ago and it really doesn't
> > make sense.
> 
> Do you mean: read from children.1 first, and then read from children.0 in
> fifo mode? Yes, the behavior is very strange.

I mean the 'children.0' 'children.1' naming is just very confusing.
Also because the order in the array is important it's even more confusing
since the 'children.1' isn't necessarily the children[1].

Dave

> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>> Berto
> >>>
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]