[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RESEND PATCH] hw/dma: fix crash caused by race condition
From: |
Stefan Hajnoczi |
Subject: |
Re: [RESEND PATCH] hw/dma: fix crash caused by race condition |
Date: |
Thu, 2 Jun 2022 06:29:12 +0100 |
On Thu, Jun 2, 2022, 02:04 Tong Zhang <ztong0001@gmail.com> wrote:
>
> Hi Stefan,
>
> On Wed, Jun 1, 2022 at 6:56 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >
> > > > This patch makes sense to me. Can you rephrase your concern?
> > >
> > > The locking is around dbs->io_func().
> > >
> > > aio_context_acquire(dbs->ctx);
> > > dbs->acb = dbs->io_func()
> > > aio_context_release(dbs->ctx);
> > >
> > >
> > > So where exactly would the lock that's now still held stop someone from
> > > modifying dbs->acb = NULL at the beginning of the function, which seems
> > > to be not protected by that lock?
> > >
> > > Maybe I'm missing some locking magic due to the lock being a recursive
> > > lock.
> >
> > Tong Zhang: Can you share a backtrace of all threads when the
> > assertion failure occurs?
> >
> Sorry I couldn't get the trace now -- but I can tell that we have some
> internal code uses
> this dma related code and will grab dbs->ctx lock in another thread
> and could overwrite dbs->acb.
>
> From my understanding, one of the reasons that the lock is required
> here is to protect dbs->acb,
> we could not reliably test io_func()'s return value after releasing
> the lock here.
>
> Since this code affects our internal code base and I did not reproduce
> on master branch,
> feel free to ignore it.
If this patch is unnecessary on qemu.git/master it raises the question
whether aio_context_acquire/release() should be removed from
dma_blk_cb(). It was added by:
commit 1919631e6b5562e474690853eca3c35610201e16
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Mon Feb 13 14:52:31 2017 +0100
block: explicitly acquire aiocontext in bottom halves that need it
Paolo: Is dma_blk_cb() called without the AioContext lock (by
virtio-scsi or any code path with IOThreads)? David pointed out that
if that's the case then the dbs->acb is accessed without the lock at
the start of dma_blk_cb().
Stefan