[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay |
Date: |
Wed, 24 Feb 2016 14:14:07 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Am 24.02.2016 um 12:59 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:address@hidden
> > Am 20.02.2016 um 08:11 hat Pavel Dovgalyuk geschrieben:
> > > > From: Pavel Dovgalyuk [mailto:address@hidden
> > > > > From: Kevin Wolf [mailto:address@hidden
> > > > > Am 16.02.2016 um 12:20 hat Pavel Dovgalyuk geschrieben:
> > > > > > Coroutine
> > > > > > Replay
> > > > > > bool *done = req_replayed_list_get(reqid) // NULL
> > > > > >
> > > > > > co =
> > > > > req_completed_list_get(e.reqid); // NULL
> > > > >
> > > > > There was no yield, this context switch is impossible to happen. Same
> > > > > for the switch back.
> > > > >
> > > > > > req_completed_list_insert(reqid, qemu_coroutine_self());
> > > > > > qemu_coroutine_yield();
> > > > >
> > > > > This is the point at which a context switch happens. The only other
> > > > > point in my code is the qemu_coroutine_enter() in the other function.
> > > >
> > > > I've fixed aio_poll problem by disabling mutex lock for the
> > > > replay_run_block_event()
> > > > execution. Now virtual machine deterministically runs 4e8 instructions
> > > > of Windows XP
> > booting.
> > > > But then one non-deterministic event happens.
> > > > Callback after finishing coroutine may be called from different
> > > > contexts.
> >
> > How does this happen? I'm not aware of callbacks being processed by any
> > thread other than the I/O thread for that specific block device (unless
> > you use dataplane, this is the main loop thread).
> >
> > > > apic_update_irq() function behaves differently being called from vcpu
> > > > and io threads.
> > > > In one case it sets CPU_INTERRUPT_POLL and in other - nothing happens.
> > >
> > > Kevin, do you have some ideas how to fix this issue?
> > > This happens because of coroutines may be assigned to different threads.
> > > Maybe there is some way of making this assignment more deterministic?
> >
> > Coroutines aren't randomly assigned to threads, but threads actively
> > enter coroutines. To my knowledge this happens only when starting a
> > request (either vcpu or I/O thread; consistent per device) or by a
> > callback when some event happens (only I/O thread). I can't see any
> > non-determinism here.
>
> Behavior of coroutines looks strange for me.
> Consider the code below (co_readv function of the replay driver).
> In record mode it somehow changes the thread it assigned to.
> Code in point A is executed in CPU thread and code in point B - in some other
> thread.
> May this happen because this coroutine yields somewhere and its execution is
> restored
> by aio_poll, which is called from iothread?
> In this case event finishing callback cannot be executed deterministically
> (always in CPU thread or always in IO thread).
>
> static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs,
> int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
> {
> BDRVBlkreplayState *s = bs->opaque;
> uint32_t reqid = request_id++;
> Request *req;
> // A
> bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov);
>
> if (replay_mode == REPLAY_MODE_RECORD) {
> replay_save_block_event(reqid);
> } else {
> assert(replay_mode == REPLAY_MODE_PLAY);
> if (reqid == current_request) {
> current_finished = true;
> } else {
> req = block_request_insert(reqid, bs, qemu_coroutine_self());
> qemu_coroutine_yield();
> block_request_remove(req);
> }
> }
> // B
> return 0;
> }
Yes, I guess this can happen. As I described above, the coroutine can be
entered from a vcpu thread initially. After yielding for the first time,
it is resumed from the I/O thread. So if there are paths where the
coroutine never yields, the coroutine completes in the original vcpu
thread. (It's not the common case that bdrv_co_readv() doesn't yield,
but it happens e.g. with unallocated sectors in qcow2.)
If this is a problem for you, you need to force the coroutine into the
I/O thread. You can do that by scheduling a BH, then yield, and then let
the BH reenter the coroutine.
Kevin
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, (continued)
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/15
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Kevin Wolf, 2016/02/15
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/16
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Kevin Wolf, 2016/02/16
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/16
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Kevin Wolf, 2016/02/16
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/18
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/20
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Kevin Wolf, 2016/02/22
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/24
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay,
Kevin Wolf <=
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/25
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Kevin Wolf, 2016/02/26
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/29
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Kevin Wolf, 2016/02/29
- Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay, Pavel Dovgalyuk, 2016/02/15