Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions

From:	Stefan Hajnoczi
Subject:	Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions
Date:	Tue, 30 Jun 2015 17:20:25 +0100
User-agent:	Mutt/1.5.23 (2014-03-12)

On Mon, Jun 29, 2015 at 06:38:13PM -0400, John Snow wrote:
> On 06/25/2015 08:12 AM, Stefan Hajnoczi wrote:
> > +/* The purpose of this is to keep txn alive until all jobs have been added 
> > */
> > +void block_job_txn_begin(BlockJobTxn *txn)
> > +{
> > +    block_job_txn_unref(txn);
> > +}
> > +
> 
> Maybe it's not entirely clear to the caller that "begin" may in fact
> actually delete the BlockJobTxn. Shall we update the caller's pointer to
> NULL in this case as a hint? Passing a **txn will imply that we are
> giving up our ownership of the object.

Good idea.

> > +/* Cancel all other jobs in case of abort, wake all waiting jobs in case of
> > + * successful completion.  Runs from main loop.
> > + */
> > +static void block_job_txn_complete(BlockJob *job, void *opaque)
> > +{
> > +    BlockJobTxn *txn = opaque;
> > +    BlockJob *other_job;
> > +    bool aborting = txn->aborting;
> > +
> > +    qemu_mutex_lock(&txn->lock);
> > +    txn->ref++; /* keep txn alive until the end of this loop */
> > +
> > +    QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> > +        AioContext *ctx;
> > +
> > +        qemu_mutex_unlock(&txn->lock);
> > +        ctx = bdrv_get_aio_context(other_job->bs);
> > +        aio_context_acquire(ctx);
> > +
> > +        /* Cancel all other jobs if aborting.  Don't cancel our own failed 
> > job
> > +         * since cancellation throws away the error value.
> > +         */
> > +        if (aborting && other_job != job) {
> > +            block_job_cancel(other_job);
> > +        } else {
> > +            block_job_enter(other_job);
> > +        }
> > +
> > +        aio_context_release(ctx);
> > +        qemu_mutex_lock(&txn->lock);
> > +    }
> > +
> > +    qemu_mutex_unlock(&txn->lock);
> > +    block_job_txn_unref(txn);
> > +}
> > +
> 
> Maybe we can name this one along the lines of
> block_job_txn_complete_impl or something clearly internal, so that we
> can name the public interface simply "block_job_txn_complete."
> 
> Maybe I'm just bike shedding, but calling only a "prepare to X" function
> without a matching "X" call in the exposed API seems odd.
> 
> I suppose it doesn't matter, because I can't think of anything nicer :)
> 
> > +void coroutine_fn block_job_txn_prepare_to_complete(BlockJobTxn *txn,
> > +                                                    BlockJob *job,
> > +                                                    int ret)

I'll rename the function to:
void coroutine_fn block_job_txn_job_done(BlockJobTxn *txn, BlockJob *job, int 
ret)

"complete" is already overloaded.  block_job_completed() is called by
jobs to terminate.  block_job_complete() is called by the monitor to
nudge a waiting job to finish.  They are two different things.  Maybe
using the term in BlockJobTxn makes things more confusing.

(It's possible to drop the txn argument since it can be fetched from
job->txn, but that makes boundary between BlockJob and BlockJobTxn
harder to understand.)

> > +{
> > +    if (!txn) {
> > +        return;
> > +    }
> > +
> > +    qemu_mutex_lock(&txn->lock);
> > +
> > +    /* This function is entered in 3 cases:
> > +     *
> > +     * 1. Successful job completion - wait for other jobs
> > +     * 2. First failed/cancelled job in txn - cancel other jobs and wait
> > +     * 3. Subsequent cancelled jobs - finish immediately, don't wait
> > +     */
> > +    trace_block_job_txn_prepare_to_complete_entry(txn, job, ret,
> > +                                                  
> > block_job_is_cancelled(job),
> > +                                                  txn->aborting,
> > +                                                  txn->jobs_pending);
> > +
> > +    if (txn->aborting) { /* Case 3 */
> > +        assert(block_job_is_cancelled(job));
> > +        goto out; /* already cancelled, don't yield */
> > +    }
> > +
> 
> So the first failure forces all jobs not-yet-complete to cancel. Is
> there any chance for a race condition of two jobs completing almost
> simultaneously, where the first fails and the second completes, and the
> 2nd job makes it here before it gets canceled?
> 
> BOD: I really assume the answer is "no," but it's not immediately
> evident to me.

Good catch!  It is possible if the 2nd job is entered after the 1st job
yielded but before block_job_txn_complete() is called.  There would have
to be a callback pending for the 2nd job.

It can be fixed by adding a new case for jobs that call
block_job_txn_prepare_to_complete() when block_job_txn_complete has been
scheduled.  The job should simply wait for block_job_txn_complete().

I'll add a test case for this scenario.

pgpW31G6Gm2r2.pgp
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 06/10] blockdev: make BlockJobTxn available to qmp 'transaction', (continued)
- [Qemu-devel] [PATCH 07/10] block/backup: support block job transactions, Stefan Hajnoczi, 2015/06/25
  - Re: [Qemu-devel] [PATCH 07/10] block/backup: support block job transactions, Fam Zheng, 2015/06/26
  - Re: [Qemu-devel] [PATCH 07/10] block/backup: support block job transactions, John Snow, 2015/06/29
    - Re: [Qemu-devel] [PATCH 07/10] block/backup: support block job transactions, Stefan Hajnoczi, 2015/06/30
    - Re: [Qemu-devel] [PATCH 07/10] block/backup: support block job transactions, John Snow, 2015/06/30
- [Qemu-devel] [PATCH 05/10] block: add block job transactions, Stefan Hajnoczi, 2015/06/25
  - Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions, Fam Zheng, 2015/06/26
  - Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions, John Snow, 2015/06/29
    - Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions, Stefan Hajnoczi <=
- [Qemu-devel] [PATCH 09/10] qmp-commands.hx: Update the supported 'transaction' operations, Stefan Hajnoczi, 2015/06/25
- [Qemu-devel] [PATCH 08/10] iotests: 124 - transactional failure test, Stefan Hajnoczi, 2015/06/25
- [Qemu-devel] [PATCH 10/10] tests: add BlockJobTxn unit test, Stefan Hajnoczi, 2015/06/25
  - Re: [Qemu-devel] [PATCH 10/10] tests: add BlockJobTxn unit test, Fam Zheng, 2015/06/26
- Re: [Qemu-devel] [PATCH 00/10] block: incremental backup transactions using BlockJobTxn, Stefan Hajnoczi, 2015/06/29

Prev by Date: Re: [Qemu-devel] mem1 is in use, can not be deleted
Next by Date: [Qemu-devel] [PATCH] linux-user: Avoid compilation error with --disable-guest-base
Previous by thread: Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions
Next by thread: [Qemu-devel] [PATCH 09/10] qmp-commands.hx: Update the supported 'transaction' operations
Index(es):
- Date
- Thread