qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional


From: John Snow
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH v10 12/14] block: add transactional properties
Date: Fri, 6 Nov 2015 15:35:00 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


On 11/06/2015 01:46 PM, John Snow wrote:
> 
> 
> On 11/06/2015 03:32 AM, Kevin Wolf wrote:
>> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
>>>
>>>
>>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
>>>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>>>>>
>>>>>
>>>>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>>>>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>>>>>> @@ -1732,6 +1757,10 @@ static void 
>>>>>>> block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>>>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>>>>>                                               common, common);
>>>>>>>  
>>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      action = common->action->block_dirty_bitmap_add;
>>>>>>>      /* AIO context taken and released within 
>>>>>>> qmp_block_dirty_bitmap_add */
>>>>>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>>>>>>> @@ -1767,6 +1796,10 @@ static void 
>>>>>>> block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>>>>>                                               common, common);
>>>>>>>      BlockDirtyBitmap *action;
>>>>>>>  
>>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      action = common->action->block_dirty_bitmap_clear;
>>>>>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>>>>>                                                action->name,
>>>>>>
>>>>>> Why do the bitmap add/clear actions not support err-cancel=all?
>>>>>>
>>>>>> I understand why other block jobs don't support it, but it's not clear
>>>>>> why these non-block job actions cannot.
>>>>>>
>>>>>
>>>>> Because they don't have a callback to invoke if the rest of the job fails.
>>>>>
>>>>> I could create a BlockJob for them complete with a callback to invoke,
>>>>> but basically it's just because there's no interface to unwind them, or
>>>>> an interface to join them with the transaction.
>>>>>
>>>>> They're small, synchronous non-job actions. Which makes them weird.
>>>>
>>>> Funny, we've been looking at the same picture while seeing different
>>>> things:
>>>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
>>>>
>>>> I think I understand your idea: the transaction should include both
>>>> immediate actions as well as block jobs.
>>>>
>>>> My mental model was different: immediate actions commit/abort along with
>>>> the 'transaction' command.  Block jobs are separate and complete/cancel
>>>> together in a group.
>>>>
>>>> In practice I think the two end up being similar because we won't be
>>>> able to implement immediate action commit/abort together with
>>>> long-running block jobs because the immediate actions rely on
>>>> quiescing/pausing the guest for atomic commit/abort.
>>>>
>>>> So with your mental model the QMP client has to submit 2 'transaction'
>>>> commands: 1 for the immediate actions, 1 for the block jobs.
>>>>
>>>> In my mental model the QMP client submits 1 command but the immediate
>>>> actions and block jobs are two separate transaction scopes.  This means
>>>> if the block jobs fail, the client needs to be aware of the immediate
>>>> actions that have committed.  Because of this, it becomes just as much
>>>> client effort as submitting two separate 'transaction' commands in your
>>>> model.
>>>>
>>>> Can anyone see a practical difference?  I think I'm happy with John's
>>>> model.
>>>>
>>>> Stefan
>>>>
>>>
>>> We discussed this off-list, but for the sake of the archive:
>>>
>>> == How it is now ==
>>>
>>> Currently, transactions have two implicit phases: the first is the
>>> synchronous phase. If this phase completes successfully, we consider the
>>> transaction a success. The second phase is the asynchronous phase where
>>> jobs launched by the synchronous phase run to completion.
>>>
>>> all synchronous commands must complete for the transaction to "succeed."
>>> There are currently (pre-patch) no guarantees about asynchronous command
>>> completion. As long as all synchronous actions complete, asynchronous
>>> actions are free to succeed or fail individually.
>>
>> You're overcomplicating this. All actions are currently synchronous and
>> what you consider asynchronous transaction actions aren't actually part
>> of the transaction at all. The action is "start block job X", not "run
>> block job X".
>>
> 
> "OK," except the entire goal of the series was to allow the "aren't
> actually part of the transaction at all" parts to live and die together
> based on the transaction they were launched from. This really implies
> they belong to a second asynchronous phase of the transaction.
> 
> Otherwise, why would totally unrelated jobs fail because a different one
> did?
> 
> I realize this is an /extension/ of the existing semantics vs a "fix,"
> and part of the confusion is how I and everyone else was looking at it
> differently. How could this happen, though? Let's look at our
> transaction documentation:
> 
> "Operations that are currently supported:" [...] "- drive-backup" [...]
> "If there is any failure
> performing any of the operations, all operations for the group are
> abandoned."
> 
> Where do we say "Except drive-backup, though, because technically the
> drive-backup action only starts the job, and we don't really consider
> this to be part of the transaction" ? We've never actually defined this
> behavior as part of our API as far as I can see.
> 
> Regardless: the net effect is still the same. We want jobs launched by
> transactions to (optionally!) cancel themselves on failure. The
> complications only arise because people want the exact semantic meaning
> to be precise for the QMP API.
> 
> The concept is simple, the language to describe it has been muddy.
> 
>>> == My Model ==
>>>
>>> The current behavior is my "err-cancel = none" scenario: we offer no
>>> guarantee about the success or failure of the transaction as a whole
>>> after the synchronous portion has completed.
>>>
>>> What I was proposing is "err-cancel = all," which to me means that _ALL_
>>> commands in this transaction are to succeed (synchronous or not) before
>>> _any_ actions are irrevocably committed. This means that for a
>>> hypothetical mixed synchronous-asynchronous transaction, that even after
>>> the transaction succeeded (it passed the synchronous phase), if an
>>> asynchronous action later fails, all actions both synchronous and non
>>> are rolled-back -- a kind of retroactive failure of the transaction.
>>> This is clearly not possible in all cases, so commands that cannot
>>> support these semantics will refuse "err-cancel = all" during the
>>> synchronous phase.
>>
>> Is this possible in any case? You're losing transaction semantics the
>> lastest when you drop the BQL that the monitor holds. At least atomicity
>> and isolation aren't given any more.
>>
> 
> It might be possible in at least *some* cases. I currently don't even
> attempt it, though. This is why some transaction actions just refuse
> this parameter if it shows up.
> 
> It'd be pretty easy to undo a bitmap-add or a bitmap-clear, as long as I
> gave these actions a conditional success callback.
> 
> The "undo" semantics of the jobs are somewhat different. I am not
> suggesting we can teleport back in time to before we tried to do the
> backup, but we can at least abort the backup and make it like you never
> issued the command -- which is important for incremental backups.
> 
> The rollback behavior for each action needs to be spelled out in our
> documentation ... as well as categorizing which actions complete before
> qmp_transactions return and which may have lingering effects (like
> drive-backup.)
> 
>> You can try to undo some parts of what you did later one, but if any
>> involved BDS was used in the meantime by anything other than the block
>> job, you don't have transactional behaviour any more.
>>
>> And isn't the management tool perfectly capable of cleaning up all the
>> block jobs without magic happening in qemu if one of them fails? Do we
>> actually need atomic failure later on? And if so, do we need atomic
>> failure only of block jobs started in the same transaction? Why?
>>
> 
> There's always a debate about who is responsible for cleaning things up
> on failures: QEMU or the management tool? Luckily: it's an optional
> parameter, so the tool can decide at run-time about what semantics it wants.
> 
> IMO: It's a little late in this series to question if we need this or
> not, but I'll oblige.
> 
> The original objection from Stefan to the incremental backup

> https://soundcloud.com/tothejazz/gyms-flappy-the-happy-sealtransaction
> semantics was that if two incremental backup jobs launched by a

LOL. I mispasted a soundcloud link in here. Well... Enjoy this stupid
song that made me laugh that I was trying to link to someone else.
Sigh!!! (So embarrassed.)

> transaction had only partial success, the management tool would have to
> take on extra state and possibly issue some corrective actions to QEMU
> in order to undo the damage.
> 
> QEMU, however, could just unravel the failure fairly trivially and allow
> the backup commands to maintain a simple binary success/failure state.
> This was agreed to be the better, less complicated management scenario.
> 
> In the case that incremental backups partially complete, you'll have one
> bitmap deleted and a different bitmap merged back onto the BDS. The
> management tool can at this point absolutely not delete the one
> incremental that got made and needs to leave it in place before
> re-issuing the incremental backup. It's then responsible for either
> squashing the two incremental backups it made, or otherwise managing the
> disparity in the number of actual backup files.
> 
> For QEMU's trouble, we don't delete incremental backup data until after
> the backup operation, so it's trivial to hold onto the data until we
> know everything's OK.
> 
> We decided it was nicest for QEMU to just roll back all of the jobs in
> this case. If the management tool disagrees, it can just roll with the
> original semantics.
> 
> 
> I still believe strongly that this is the right way to go. It's
> flexible, allows for either party to manage the failure, the parameter
> is completely non-ambiguous, provides for early failure if the expected
> semantics are not possible, and the code is already here and mostly
> reviewed... except for the API names.
> 
> --js
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]