qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 05/10] block: add block job transactions


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v2 05/10] block: add block job transactions
Date: Wed, 8 Jul 2015 14:36:26 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

On Wed, Jul 08, 2015 at 09:59:24AM +0800, Fam Zheng wrote:
> On Tue, 07/07 13:59, Stefan Hajnoczi wrote:
> > On Tue, Jul 07, 2015 at 03:32:45PM +0800, Fam Zheng wrote:
> > > On Mon, 07/06 15:24, Stefan Hajnoczi wrote:
> > > > +/**
> > > > + * block_job_txn_add_job:
> > > > + * @txn: The transaction (may be NULL)
> > > > + * @job: Job to add to the transaction
> > > > + *
> > > > + * Add @job to the transaction.  The @job must not already be in a 
> > > > transaction.
> > > > + * The block job driver must call block_job_txn_prepare_to_complete() 
> > > > before
> > > 
> > > s/block_job_txn_prepare_to_complete/block_job_txn_job_done/
> > > 
> > > Reading this for a second time I start to feel it too complicated for the 
> > > good.
> > > 
> > > I have another idea: in block_job_completed, check if other jobs have 
> > > failed,
> > > and call this job driver's (imaginary) "abort()" callback accordingly; if 
> > > all
> > > jobs has succeeded, call a "commit" callback during last 
> > > block_job_completed.
> > > 
> > > Does that make sense?
> > 
> > I think you've skipped the hard part: immediate cancellation.  If a job
> > is cancelled by the user or a job fails, then all other jobs are
> > cancelled immediately.
> > 
> > Immediate cancellation has the problem that jobs could be running in any
> > AioContext, so you need to handle concurrency.  That's where the
> > locking, juggling AioContexts, and interaction between blockjobs comes
> > in.
> 
> OK, let me try again:
> 
> The idea is intercepting job->cb so we can handle jobs completely in
> block_job_completed (which is in main loop), rather than the coroutines that
> can be any AioContext.
> 
> 1) If a job is cancelled or failed, it goes to block_job_completed 
> immediately,
> with block_job_is_cancelled() == true. In this case, we call
> block_job_cancel_sync on all other jobs, and then call "abort()" callbacks to
> reclaim any bitmaps, then emit QMP events. Some other jobs may have already
> completed before this point, but it's not a problem because we always defer 
> the
> actual completions (abort/commit and QMP) altogether.
> 
> 2) If there is no job failed or canceled, in the last block_job_completed, we
> call "commit()" to abdicate bitmaps, and emit the QMP events.
> 
> This would still require BlockJobTxn to track the block jobs in a group, but
> hopefully it could reduce the complexity of interactions between block jobs.
> 
> I can prototype it if this isn't missing anything obvious.

Yes, please try it.  It's half-way between what John originally did and
what I did.  It might be the simplest solution.

Be careful with the final piece of code used to complete jobs from
block_job_defer_to_main_loop().  It runs from a BH in the main loop
after the coroutine has terminated.  In the fail/cancel case you might
need to protect against race conditions - especially if two jobs finish
in the same event loop iteration.

I didn't handle that since block_job_txn_job_done() is called while the
coroutine is still alive.

Attachment: pgpSWMv0rAGMw.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]