qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl


From: Pavel Dovgalyuk
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Tue, 16 Feb 2016 14:20:27 +0300

> From: Kevin Wolf [mailto:address@hidden
> Am 16.02.2016 um 07:25 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:address@hidden
> > > Am 15.02.2016 um 15:24 hat Pavel Dovgalyuk geschrieben:
> > > > > From: Kevin Wolf [mailto:address@hidden
> > > > > > Here is the current version of blkreplay functions:
> > > > > >
> > > > > > static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs,
> > > > > >     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
> > > > > > {
> > > > > >     uint32_t reqid = request_id++;
> > > > > >     Request *req;
> > > > > >     req = block_request_insert(reqid, bs, qemu_coroutine_self());
> > > > > >     bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov);
> > > > > >
> > > > > >     if (replay_mode == REPLAY_MODE_RECORD) {
> > > > > >         replay_save_block_event(reqid);
> > > > > >     } else {
> > > > > >         assert(replay_mode == REPLAY_MODE_PLAY);
> > > > > >         qemu_coroutine_yield();
> > > > > >     }
> > > > > >     block_request_remove(req);
> > > > > >
> > > > > >     return 0;
> > > > > > }
> > > > > >
> > > > > > void replay_run_block_event(uint32_t id)
> > > > > > {
> > > > > >     Request *req;
> > > > > >     if (replay_mode == REPLAY_MODE_PLAY) {
> > > > > >         while (!(req = block_request_find(id))) {
> > > > > >             //aio_poll(bdrv_get_aio_context(req->bs), true);
> > > > > >             usleep(1);
> > > > > >         }
> > > > >
> > > > > How is this loop supposed to make any progress?
> > > >
> > > > This loop does not supposed to make any progress. It waits until 
> > > > block_request_insert
> > > > call is added to the queue.
> > >
> > > Yes. And who is supposed to add something to the queue? You are looping
> > > and doing nothing, so no other code gets to run in this thread. It's
> > > only aio_poll() that would run event handlers that could eventually add
> > > something to the list.
> >
> > blkreplay_co_readv adds request to the queue.
> 
> I don't see blkreplay_co_readv() called in this loop without aio_poll().

Is this mandatory?
I thought that version which you proposed has a race condition at the following 
lines
when AIO request is operated by a worker thread. Is this correct?

Coroutine                                                         Replay
bool *done = req_replayed_list_get(reqid) // NULL
                                                                  co =
req_completed_list_get(e.reqid); // NULL
req_completed_list_insert(reqid, qemu_coroutine_self());
qemu_coroutine_yield();
                                                                  bool done = 
false;
                                                                  
req_replayed_list_insert(reqid,
&done);
                                                                  while (!done) 
{ // nobody will
change done anymore
                                                                       
aio_poll();
                                                                  }


> > You are right, but I found kind of fundamental problem here.
> > There are two possible approaches for replaying coroutine events:
> >
> > 1. Waiting with aio_poll in replay_run_block_event.
> >    In this case replay cannot be made, because of recursive mutex lock:
> >    aio_poll -> qemu_clock_get_ns -> <lock replay mutex> ->
> >      replay_run_block_event -> aio_poll -> qemu_clock_get_ns -> <lock 
> > replay mutex> ->
> >      <assert failed>
> >    I.e. we have recursive aio_poll function calls that lead to recursive 
> > replay calls
> >    and to recursive mutex lock.
> 
> That's what you get for replaying stuff in qemu_clock_get_ns() when it
> should really be replayed in the hardware emulation.

I'm not sure that it is possible for RTC. Host clock should run even if VM is 
stopped.
If we emulate host clock, its behavior should be somehow synchronized with the 
host.

> This is a problem you would also have had with your original patch, as
> there are some places in the block layer that use aio_poll() internally.
> Using the blkreplay driver only made you find the bug earlier.

Not exactly. Host clock replay will use cached clock value if there is no such 
event in the log.
The problem I observe here is with asynchronous block events that should be 
processed
immediately as they read from log.

> But... can you detail how we get from aio_poll() to qemu_clock_get_ns()?
> I don't see it directly calling that function, so it might just be a
> callback to some guest device that is running now.

aio_poll (win32) has the following line:

        timeout = blocking && !have_select_revents
            ? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0;

POSIX version also uses that function. aio_compute_timeout() requests time from
all existing clocks by calling timerlistgroup_deadline_ns().


Pavel Dovgalyuk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]