qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced


From: Max Reitz
Subject: Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced
Date: Thu, 6 Feb 2020 18:06:45 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1

On 06.02.20 17:57, Kevin Wolf wrote:
> Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
>> On 06.02.20 16:51, Kevin Wolf wrote:
>>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>>>> On 06.02.20 15:58, Kevin Wolf wrote:
>>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>>>> replaced without breaking anything else.
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <address@hidden>
>>>>>>>> ---
>>>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>>>  1 file changed, 25 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>>>> --- a/block/quorum.c
>>>>>>>> +++ b/block/quorum.c
>>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>>>  
>>>>>>>>  typedef struct QuorumChild {
>>>>>>>>      BdrvChild *child;
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * If set, check whether this node can be replaced without any
>>>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>>>> +     * WRITE permission.
>>>>>>>> +     */
>>>>>>>> +    bool to_be_replaced;
>>>>>>>
>>>>>>> I don't understand these permission changes. How does (preparing for)
>>>>>>> detaching a node from quorum make its content invalid?
>>>>>>
>>>>>> It doesn’t, of course.  What we are preparing for is to replace it by
>>>>>> some other node with some other content.
>>>>>>
>>>>>>> And why do we
>>>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>>>> read-only?
>>>>>>>
>>>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>>>> Is this really "make sure that"?
>>>>>>
>>>>>> I think the comment is not only unclear, it is the problem.  (Well,
>>>>>> maybe the code is also.)
>>>>>>
>>>>>> This series is about fixing at least some things about replacing nodes
>>>>>> by mirroring.  The original use cases this was introduced for was to fix
>>>>>> broken quorum children: The other children are still intact, so you read
>>>>>> from the quorum node and replace the broken child (which maybe shows
>>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>>>
>>>>>> Replacing that broken node by the fixed one changes the data that’s
>>>>>> visible on that node.
>>>>>
>>>>> Hm, yes, that's true. But I wonder if this is really something that the
>>>>> permission system must catch. Like other graph manipulations, it's
>>>>> essentially the user saying "trust me, I know what I'm doing, this node
>>>>> makes sense in this place".
>>>>>
>>>>> Because if you assume that the user could add a node with unsuitable
>>>>> content and you want to prevent this, where do we stop?
>>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>>>> visible data change. Should we therefore only allow snapshots when
>>>>> shared writes are allowed? This doesn't work obviously.
>>>>>
>>>>> So I'm inclined to say that this is the user's responsibility and we
>>>>> don't have to jump through hoops to prevent every possible way that the
>>>>> user could mess up. (Which often also result in preventing legitimate
>>>>> cases like here a quorum of read-only nodes.)
>>>>
>>>> Well, if you ask the question “where do we stop”, we also have to ask
>>>> the question “where do we start”.  If we say the user knows what they’re
>>>> doing, we might as well drop the whole can_replace infrastructure
>>>> altogether and just assume that you can replace any node by anything.
>>>
>>> Well, I don't actually know if that would be completely unreasonable.
>>> The idea was obviously to keep graph changes restricted to very specific
>>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
>>> have quite a few more operations that allow changing the graph.
>>>
>>> So if preventing some cases gives us headaches and is probably more work
>>> than dealing with any bugs they might reveal, maybe preventing them is
>>> wrong.
>>>
>>> I'm just afraid that we might be overengineering this and waste time on
>>> things that we don't actually get much use from.
>>
>> That’s why I’m asking.
> 
> Did I answer your question sufficiently then?

No, because “I’m afraid” is a sentiment I fully share, but it doesn’t
answer the question whether we are indeed overengineering this or not. :-)

I suppose my stance now is “This is probably overengineered, but now we
might as well roll with it”.

>>>> If the WRITE permission is the problem, then I suppose we can drop that.
>>>>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
>>>> other parents anyway.
>>>
>>> WRITE is probably the more practical problem, though it's technically
>>> the correct one to take.
>>>
>>> CONSISTENT_READ is already a problem in theory because replacing a child
>>> node with different content doesn't even match its definition:
>>>
>>>     /**
>>>      * A user that has the "permission" of consistent reads is guaranteed 
>>> that
>>>      * their view of the contents of the block device is complete and
>>>      * self-consistent, representing the contents of a disk at a specific
>>>      * point.
>>>      *
>>>      * For most block devices (including their backing files) this is true, 
>>> but
>>>      * the property cannot be maintained in a few situations like for
>>>      * intermediate nodes of a commit block job.
>>>      */
>>>     BLK_PERM_CONSISTENT_READ    = 0x01,
>>>
>>> Replacing an image with a different image means that the node represents
>>> the content of a different disk now, but it's probably still complete
>>> and self-consistent.
>>
>> At any point in time yes, but not over the time span of the change.  The
>> definition doesn’t say that the node represents the contents of a disk
>> at a specific point, but the view from the parent.
>>
>> I argue that that view is always over some period of time, so if you
>> suddenly switch out the whole disk, then it isn’t a self-consistent view.
> 
> I think your theory that it's over some period of time conflicts with
> the documentation that says "at a specific point".

I’d rather not get into a deeper discussion on what CONSISTENT_READ
means again... :-/

I always feel like if you really take only a single point in time, then
anything could be some hypothetical disk.

So to me, unsharing CONSISTENT_READ effectively just means “Don’t touch
this, you don’t want to”.

>> Alternatively, we could of course also just forego the permission system
>> here altogether and just check that there are no other parents at all.
>> (Which is effectively the same as unsharing CONSISTENT_READ.)
> 
> This would sidestep all of the artificial permission twiddling, which
> sounds good.
> 
> It would probably also needlessly restrict the allowed use cases,

Only in theory, though, because in practice basically everything useful
takes CONSISTENT_READ anyway.

> but
> then, who cares about nodes with multiple parents, one of which is a
> quorum node?
> 
> So I guess I would be fine with either checking that there are no
> parents or maybe even just dropping the check completely.

OK, I’ll check the parent list then.  (Except it must be exactly one
parent, namely Quorum.)

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]