qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backu


From: Jeff Cody
Subject: Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
Date: Tue, 8 Nov 2016 13:30:39 -0500
User-agent: Mutt/1.5.24 (2015-08-30)

On Tue, Nov 08, 2016 at 10:24:50AM -0500, John Snow wrote:
> 
> 
> On 11/08/2016 04:11 AM, Kevin Wolf wrote:
> >Am 08.11.2016 um 06:41 hat John Snow geschrieben:
> >>On 11/03/2016 09:17 AM, Kevin Wolf wrote:
> >>>Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> >>>>Refactor backup_start as backup_job_create, which only creates the job,
> >>>>but does not automatically start it. The old interface, 'backup_start',
> >>>>is not kept in favor of limiting the number of nearly-identical interfaces
> >>>>that would have to be edited to keep up with QAPI changes in the future.
> >>>>
> >>>>Callers that wish to synchronously start the backup_block_job can
> >>>>instead just call block_job_start immediately after calling
> >>>>backup_job_create.
> >>>>
> >>>>Transactions are updated to use the new interface, calling block_job_start
> >>>>only during the .commit phase, which helps prevent race conditions where
> >>>>jobs may finish before we even finish building the transaction. This may
> >>>>happen, for instance, during empty block backup jobs.
> >>>>
> >>>>Reported-by: Vladimir Sementsov-Ogievskiy <address@hidden>
> >>>>Signed-off-by: John Snow <address@hidden>
> >>>
> >>>>+static void drive_backup_commit(BlkActionState *common)
> >>>>+{
> >>>>+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, 
> >>>>common);
> >>>>+    if (state->job) {
> >>>>+        block_job_start(state->job);
> >>>>+    }
> >>>>}
> >>>
> >>>How could state->job ever be NULL?
> >>>
> >>
> >>Mechanical thinking. It can't. (I definitely didn't copy paste from
> >>the .abort routines. Definitely.)
> >>
> >>>Same question for abort, and for blockdev_backup_commit/abort.
> >>>
> >>
> >>Abort ... we may not have created the job successfully. Abort gets
> >>called whether or not we made it to or through the matching
> >>.prepare.
> >
> >Ah, yes, I always forget about this. It's so counterintuitive (and
> >bdrv_reopen() actually works differently, it only aborts entries that
> >have successfully been prepared).
> >
> >Is there a good reason why qmp_transaction() works this way, especially
> >since we have a separate .clean function?
> >
> >Kevin
> >
> 
> We just don't track which actions have succeeded or not, so we loop through
> all actions on each phase regardless.
> 
> I could add a little state enumeration (or boolean) to each action and I
> could adjust abort to only run on actions that either completed or failed,
> but in this case I think it still wouldn't change the text for .abort,
> because an action may fail before it got to creating the job, for instance.
> 

As far as this part goes, couldn't we just do it without any flags, by not
inserting the state into the snap_bdrv_states list unless it was successful
(assuming _prepare cleans up itself on failure)?  E.g.:
 
-        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, state, entry);
 
         state->ops->prepare(state, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
+            g_free(state);
             goto delete_and_fail;
         }
+        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, state, entry);
     }

> Unless you'd propose undoing .prepare IN .prepare in failure cases, but why
> write abort code twice? I don't mind it living in .abort, personally.
>

Doing it the above way would indeed require prepare functions to clean up
after themselves on failure.

The bdrv_reopen() model does it this way, and I think it makes sense.  With
most APIs, on failure you wouldn't have a way of knowing what has or has not
been done, so it leaves everything in a clean state.  I think this is a good
model to follow.

It is also what most QEMU block interfaces currently do, iirc (.bdrv_open,
etc.) - if it fails, it is assumed that it frees all resources it allocated.

I guess it doesn't have to be done this way, and the complexity can just be
pushed into the _abort() function.  After all, with these transactional
models, there exists an abort function, which differentiates it from most
other APIs.  But the downfall is that we have different ways of handling
essentially the same sort of transactional model in the block layer (between
bdrv_reopen and qmp_transaction), and it trips up reviewers / authors.  

(I don't think changing how qmp_transaction handles this is something that
needs to be handled in this series - but it would be nice in the future
sometime).

Jeff



reply via email to

[Prev in Thread] Current Thread [Next in Thread]