qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Mon, 22 Feb 2016 12:06:44 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 20.02.2016 um 08:11 hat Pavel Dovgalyuk geschrieben:
> > From: Pavel Dovgalyuk [mailto:address@hidden
> > > From: Kevin Wolf [mailto:address@hidden
> > > Am 16.02.2016 um 12:20 hat Pavel Dovgalyuk geschrieben:
> > > > Coroutine                                                         Replay
> > > > bool *done = req_replayed_list_get(reqid) // NULL
> > > >                                                                   co =
> > > req_completed_list_get(e.reqid); // NULL
> > >
> > > There was no yield, this context switch is impossible to happen. Same
> > > for the switch back.
> > >
> > > > req_completed_list_insert(reqid, qemu_coroutine_self());
> > > > qemu_coroutine_yield();
> > >
> > > This is the point at which a context switch happens. The only other
> > > point in my code is the qemu_coroutine_enter() in the other function.
> > 
> > I've fixed aio_poll problem by disabling mutex lock for the 
> > replay_run_block_event()
> > execution. Now virtual machine deterministically runs 4e8 instructions of 
> > Windows XP booting.

Are you sure that the lock was unnecessary? Solving deadlocks by
removing the lock is a rather adventurous method.

I assume that you're still replaying events from low-level functions
like qemu_clock_get_ns(). So even if you get rid of the hangs, the
result is probably not quite right.

I'm afraid this is going in a direction where my comments can't be more
constructive than a simple "you're doing it wrong".

> > But then one non-deterministic event happens.
> > Callback after finishing coroutine may be called from different contexts.

How does this happen? I'm not aware of callbacks being processed by any
thread other than the I/O thread for that specific block device (unless
you use dataplane, this is the main loop thread).

> > apic_update_irq() function behaves differently being called from vcpu and 
> > io threads.
> > In one case it sets CPU_INTERRUPT_POLL and in other - nothing happens.
> 
> Kevin, do you have some ideas how to fix this issue?
> This happens because of coroutines may be assigned to different threads.
> Maybe there is some way of making this assignment more deterministic?

Coroutines aren't randomly assigned to threads, but threads actively
enter coroutines. To my knowledge this happens only when starting a
request (either vcpu or I/O thread; consistent per device) or by a
callback when some event happens (only I/O thread). I can't see any
non-determinism here.

Kevin

> > Therefore execution becomes non-deterministic.
> > In previous version of the patch I solved this problem by linking block 
> > events to the
> > execution checkpoints. IO thread have its own checkpoints and vcpu - its 
> > own.
> > Therefore apic callbacks are always called from the same thread in replay 
> > as in recording
> > phase.
> 
> Pavel Dovgalyuk
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]