Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl

From:	Pavel Dovgalyuk
Subject:	Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date:	Tue, 16 Feb 2016 09:25:08 +0300

> From: Kevin Wolf [mailto:address@hidden
> Am 15.02.2016 um 15:24 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:address@hidden
> >
> > > > There could be asynchronous events that occur in non-cpu threads.
> > > > For now these events are shutdown request and block task execution.
> > > > They may "hide" following clock (or another one) events. That is why
> > > > we process them until synchronous event (like clock, instructions
> > > > execution, or checkpoint) is met.
> > > >
> > > >
> > > > > Anyway, what does "can't proceed" mean? The coroutine yields because
> > > > > it's waiting for I/O, but it is never reentered? Or is it hanging 
> > > > > while
> > > > > trying to acquire a lock?
> > > >
> > > > I've solved this problem by slightly modifying the queue.
> > > > I haven't yet made BlockDriverState assignment to the request ids.
> > > > Therefore aio_poll was temporarily replaced with usleep.
> > > > Now execution starts and hangs at some random moment of OS loading.
> > > >
> > > > Here is the current version of blkreplay functions:
> > > >
> > > > static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs,
> > > >     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
> > > > {
> > > >     uint32_t reqid = request_id++;
> > > >     Request *req;
> > > >     req = block_request_insert(reqid, bs, qemu_coroutine_self());
> > > >     bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov);
> > > >
> > > >     if (replay_mode == REPLAY_MODE_RECORD) {
> > > >         replay_save_block_event(reqid);
> > > >     } else {
> > > >         assert(replay_mode == REPLAY_MODE_PLAY);
> > > >         qemu_coroutine_yield();
> > > >     }
> > > >     block_request_remove(req);
> > > >
> > > >     return 0;
> > > > }
> > > >
> > > > void replay_run_block_event(uint32_t id)
> > > > {
> > > >     Request *req;
> > > >     if (replay_mode == REPLAY_MODE_PLAY) {
> > > >         while (!(req = block_request_find(id))) {
> > > >             //aio_poll(bdrv_get_aio_context(req->bs), true);
> > > >             usleep(1);
> > > >         }
> > >
> > > How is this loop supposed to make any progress?
> >
> > This loop does not supposed to make any progress. It waits until 
> > block_request_insert
> > call is added to the queue.
> 
> Yes. And who is supposed to add something to the queue? You are looping
> and doing nothing, so no other code gets to run in this thread. It's
> only aio_poll() that would run event handlers that could eventually add
> something to the list.

blkreplay_co_readv adds request to the queue.

> 
> > > And I still don't understand why aio_poll() doesn't work and where it
> > > hangs.
> >
> > aio_poll hangs if "req = block_request_insert(reqid, bs, 
> > qemu_coroutine_self());" line
> > is executed after bdrv_co_readv. When bdrv_co_readv yields, 
> > replay_run_block_event has no
> > information about pending request and cannot jump to its coroutine.
> > Maybe I should implement aio_poll execution there to make progress in that 
> > case?
> 
> Up in the thread, I posted code that was a bit more complex than what
> you have, and which considered both cases (completion before
> replay/completion after replay). If you implement it like this, it might
> just work. And yes, it involved an aio_poll() loop in the replay
> function.

You are right, but I found kind of fundamental problem here.
There are two possible approaches for replaying coroutine events:

1. Waiting with aio_poll in replay_run_block_event.
   In this case replay cannot be made, because of recursive mutex lock:
   aio_poll -> qemu_clock_get_ns -> <lock replay mutex> -> 
     replay_run_block_event -> aio_poll -> qemu_clock_get_ns -> <lock replay 
mutex> ->
     <assert failed>
   I.e. we have recursive aio_poll function calls that lead to recursive replay 
calls
   and to recursive mutex lock.

2. Adding block events to the queue until checkpoint is met.
   This will not work with synchronous requests, because they will wait 
infinitely
   and checkpoint will never happen. Correct me, if I'm not right for this case.
   If blkreplay_co_readv will yield, can vcpu thread unlock the BQL and wait 
for 
   the checkpoint in iothread?

Pavel Dovgalyuk

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, (continued)

Prev by Date: Re: [Qemu-devel] [PATCH COLO-Frame v14 24/40] COLO: Process shutdown command for VM in COLO state
Next by Date: [Qemu-devel] [PATCH] s390: remove misleading comment
Previous by thread: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Next by thread: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Index(es):
- Date
- Thread