qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation


From: Kevin Wolf
Subject: Re: [Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation
Date: Thu, 25 Sep 2014 14:39:44 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 25.09.2014 um 14:29 hat Alexey Kardashevskiy geschrieben:
> On 09/25/2014 08:20 PM, Kevin Wolf wrote:
> > Am 25.09.2014 um 11:55 hat Alexey Kardashevskiy geschrieben:
> >> Right. Cool. So is below what was suggested? I am doublechecking as it does
> >> not solve the original issue - the bottomhalf is called first and then
> >> nbd_trip() crashes in qcow2_co_flush_to_os().
> >>
> >> diff --git a/block.c b/block.c
> >> index d06dd51..1e6dfd1 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -5037,20 +5037,22 @@ void bdrv_invalidate_cache(BlockDriverState *bs,
> >> Error **errp)
> >>      if (local_err) {
> >>          error_propagate(errp, local_err);
> >>          return;
> >>      }
> >>
> >>      ret = refresh_total_sectors(bs, bs->total_sectors);
> >>      if (ret < 0) {
> >>          error_setg_errno(errp, -ret, "Could not refresh total sector 
> >> count");
> >>          return;
> >>      }
> >> +
> >> +    bdrv_drain_all();
> >>  }
> > 
> > Try moving the bdrv_drain_all() call to the top of the function (at
> > least it must be called before bs->drv->bdrv_invalidate_cache).
> 
> 
> Ok, I did. Did not help.
> 
> 
> > 
> >> +static QEMUBH *migration_complete_bh;
> >> +static void process_incoming_migration_complete(void *opaque);
> >> +
> >>  static void process_incoming_migration_co(void *opaque)
> >>  {
> >>      QEMUFile *f = opaque;
> >> -    Error *local_err = NULL;
> >>      int ret;
> >>
> >>      ret = qemu_loadvm_state(f);
> >>      qemu_fclose(f);
> > 
> > Paolo suggested to move eveything starting from here, but as far as I
> > can tell, leaving the next few lines here shouldn't hurt.
> 
> 
> Ouch. I was looking at wrong qcow2_fclose() all this time :)
> Aaaany what you suggested did not help -
> bdrv_co_flush() calls qemu_coroutine_yield() while this BH is being
> executed and the situation is still the same.

Hm, do you have a backtrace? The idea with the BH was that it would be
executed _outside_ coroutine context and therefore wouldn't be able to
yield. If it's still executed in coroutine context, it would be
interesting to see who that caller is.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]