qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced


From: Max Reitz
Subject: Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
Date: Thu, 6 Feb 2020 11:11:05 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1

On 05.02.20 16:38, Kevin Wolf wrote:
> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>> We will need this to verify that Quorum can let one of its children be
>> replaced without breaking anything else.
>>
>> Signed-off-by: Max Reitz <address@hidden>
>> ---
>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 59cd524502..6a7224c9e4 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>  
>>  typedef struct QuorumChild {
>>      BdrvChild *child;
>> +
>> +    /*
>> +     * If set, check whether this node can be replaced without any
>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>> +     * WRITE permission.
>> +     */
>> +    bool to_be_replaced;
> 
> I don't understand these permission changes. How does (preparing for)
> detaching a node from quorum make its content invalid?

It doesn’t, of course.  What we are preparing for is to replace it by
some other node with some other content.

> And why do we
> suddenly need WRITE permissions even if the quorum node is only used
> read-only?
> 
> The comment is a bit unclear, too. "check whether" implies that both
> outcomes could be true, but it doesn't say what happens in either case.
> Is this really "make sure that"?

I think the comment is not only unclear, it is the problem.  (Well,
maybe the code is also.)

This series is about fixing at least some things about replacing nodes
by mirroring.  The original use cases this was introduced for was to fix
broken quorum children: The other children are still intact, so you read
from the quorum node and replace the broken child (which maybe shows
invalid data, or maybe just EIO) by the fixed mirror result.

Replacing that broken node by the fixed one changes the data that’s
visible on that node.

That’s fine with quorum because that one child never influenced its
result anyway.  In fact, we know that the new child agrees with the
quorum, so it actually reduces conflict.

But if there are other parents on the node, they may disagree.  So we
need to warn them that we will disrupt consistency by replacing the node
with a potentially completely different one.  If those other parents
don’t care about consistency (CONSISTENT_READ) and don’t mind data
changes (other WRITE users), then we can freely replace the node still.


Now (assuming that this reasoning makes sense) it may seem as if the
general block layer should handle such cases; like it should unshare
CONSISTENT_READ and take WRITE.  But that isn’t true, because it calls
bdrv_recurse_can_replace() specifically to check that the some node can
be replaced by the new node without changing any visible data.  So
usually there are no such sudden data changes.

Quorum is the only node that can tolerate such data changes on its
children without changing its own visible data.  Hence it’s responsible
for ensuring that there’s noone else that minds if one of its child
nodes is replaced by something else.

(Note that this isn’t about replacing a BdrvChild, but about replacing a
BDS.  It isn’t like only quorum’s BdrvChild will be made to point
somewhere else; all BdrvChild pointers to the old node will be made to
point to the new one.)


Again assuming this makes sense, I wonder how we could condense that
into a reasonable comment.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]