qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace


From: Max Reitz
Subject: Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
Date: Thu, 6 Feb 2020 16:19:07 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1

On 06.02.20 15:42, Kevin Wolf wrote:
> Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
>> On 05.02.20 16:55, Kevin Wolf wrote:
>>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
>>>> Signed-off-by: Max Reitz <address@hidden>
>>>> ---
>>>>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 62 insertions(+)
>>>>
>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>> index 3a824e77e3..8ee03e9baf 100644
>>>> --- a/block/quorum.c
>>>> +++ b/block/quorum.c
>>>> @@ -825,6 +825,67 @@ static bool 
>>>> quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>>>      return false;
>>>>  }
>>>>  
>>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>>>> +                                       BlockDriverState *to_replace)
>>>> +{
>>>> +    BDRVQuorumState *s = bs->opaque;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < s->num_children; i++) {
>>>> +        /*
>>>> +         * We have no idea whether our children show the same data as
>>>> +         * this node (@bs).  It is actually highly likely that
>>>> +         * @to_replace does not, because replacing a broken child is
>>>> +         * one of the main use cases here.
>>>> +         *
>>>> +         * We do know that the new BDS will match @bs, so replacing
>>>> +         * any of our children by it will be safe.  It cannot change
>>>> +         * the data this quorum node presents to its parents.
>>>> +         *
>>>> +         * However, replacing @to_replace by @bs in any of our
>>>> +         * children's chains may change visible data somewhere in
>>>> +         * there.  We therefore cannot recurse down those chains with
>>>> +         * bdrv_recurse_can_replace().
>>>> +         * (More formally, bdrv_recurse_can_replace() requires that
>>>> +         * @to_replace will be replaced by something matching the @bs
>>>> +         * passed to it.  We cannot guarantee that.)
>>>> +         *
>>>> +         * Thus, we can only check whether any of our immediate
>>>> +         * children matches @to_replace.
>>>> +         *
>>>> +         * (In the future, we might add a function to recurse down a
>>>> +         * chain that checks that nothing there cares about a change
>>>> +         * in data from the respective child in question.  For
>>>> +         * example, most filters do not care when their child's data
>>>> +         * suddenly changes, as long as their parents do not care.)
>>>> +         */
>>>> +        if (s->children[i].child->bs == to_replace) {
>>>> +            Error *local_err = NULL;
>>>> +
>>>> +            /*
>>>> +             * We now have to ensure that there is no other parent
>>>> +             * that cares about replacing this child by a node with
>>>> +             * potentially different data.
>>>> +             */
>>>> +            s->children[i].to_be_replaced = true;
>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, 
>>>> &local_err);
>>>> +
>>>> +            /* Revert permissions */
>>>> +            s->children[i].to_be_replaced = false;
>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, 
>>>> &error_abort);
>>>
>>> Quite a hack. The two obvious problems are:
>>>
>>> 1. We can't guarantee that we can actually revert the permissions. I
>>>    think we ignore failure to loosen permissions meanwhile so that at
>>>    least the &error_abort doesn't trigger, but bs could still be in the
>>>    wrong state afterwards.
>>
>> I thought we guaranteed that loosening permissions never fails.
>>
>> (Well, you know.  It may “leak” permissions, but we’d never get an error
>> here so there’s nothing to handle anyway.)
> 
> This is what I meant. We ignore the failure (i.e. don't return an error),
> but the result still isn't completely correct ("leaked" permissions).
> 
>>>    It would be cleaner to use check+abort instead of actually setting
>>>    the new permission.
>>
>> Oh.  Yes.  Maybe.  It does require more code, though, because I’d rather
>> not use bdrv_check_update_perm() from here as-is.
> 
> I'm not saying you need to do it, just that it would be cleaner. :-)

It would.  Thanks for the suggestion, I obviously didn’t think of it.
(Or there’d be a comment on how this is not the best way in theory, but
in practice it’s good enough.)  I suppose I’ll see how what I can do.

>>> 2. As aborting the permission change makes more obvious, we're checking
>>>    something that might not be true any more when we actually make the
>>>    change.
>>
>> True.  I tried to do it right by having a post-replace cleanup function,
>> but after a while that was just going nowhere, really.  So I just went
>> with what’s patch 13 here.
>>
>> But isn’t 13 enough, actually?  It check can_replace right before
>> replacing in a drained section.  I can’t imagine the permissions to
>> change there.
> 
> Permissions are tied to file locks, so an external process can just grab
> the locks in between.

Ah, right, I didn’t think of that.

> But if I understand correctly, all we try here is
> to have an additional safeguard to prevent the user from doing stupid
> things. So I guess not being 100% is fine as long as it's documented in
> the code.

Yes.  I just think it actually would be 100 % in practice, so I wondered
whether it would need to be documented.

You’re right, though, it isn’t 100 %, so it should definitely be
documented.  Maybe something like

In theory, we would have to keep the permissions tightened until the
node is replaced.  In practice, that would require post-replacement
cleanup infrastructure, which we do not have, and which would be
unreasonably complex to implement.  Therefore, all we can do is require
anyone who wants to replace one node by some potentially unrelated other
node (i.e., the mirror job on completion) to invoke
bdrv_recurse_can_replace() immediately before and thus minimize the time
during which some condition may arise that might forbid the swap.

?

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]