qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] lock-free monitor?


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] lock-free monitor?
Date: Wed, 10 Feb 2016 15:33:42 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

* Markus Armbruster (address@hidden) wrote:
> "Dr. David Alan Gilbert" <address@hidden> writes:
> 
> > * Markus Armbruster (address@hidden) wrote:
> >> "Dr. David Alan Gilbert" <address@hidden> writes:
> >> 
> >> > Hi,
> >> >   I wondered what it would take to be able to do a lock-free monitor;
> >> > i.e. one that could respond to (some) commands while the qemu big lock 
> >> > is held.
> >> 
> >> Requires a careful audit of the monitor code.
> >> 
> >> For instance, cur_mon needs to be made thread local for running multiple
> >> monitors concurrently.
> >
> > Hmm that's fun;  and cur_mon is used all over the place when 'fd' passing
> > is used - although that probably should be cleaned up somehow.
> 
> When cur_mon got created, we had no threads.  Even now, monitors only
> ever run under the BQL, so cur_mon's still fine.
> 
> Having a current monitor is simpler than passing one around, especially
> when you go through layers that don't know about it.  Such cases exist,
> and they made us abandon commit 376253e's hope to eliminate cur_mon.
> Unfortunately, I've since forgotten the details, so I can't point you to
> an example.

Yes,  I think maybe if we can try and remove the use of cur_mon one
use at a time outside of the monitor it might move it in the right direction.

> >> > My reason for asking is that there are cases in migration and colo
> >> > where a network block device has failed and is blocking, but it fails
> >> > at just the wrong time while the lock is held, and now, you don't have
> >> 
> >> Is it okay to block while holding the BQL?
> >
> > I'd hope the simple answer was no; unfortunately the more complex answer
> > is that we keep finding places that do.
> 
> Would you call these bugs?
> 
> Even if you do, we may want to recover from them.

They're a bug in the end result that needs fixing; so for example a 
crashing NBD server shouldn't lock you out of the monitor during a migrate,
and I don't think we current;y have other solutions.

> >                                          The cases I'm aware of are
> > mostly bdrv_drain_all calls in migration/colo, where one of the block
> > devices (e.g. an NBD network device) fails.  Generally these are called
> > at the end of a migration cycle when we're just ensuring that everything
> > really is drained and are normally called with the lock held to ensure
> > that nothing else is generating new traffic as we're clearing everything 
> > else.
> 
> Using the BQL for that drives a cork into the I/O pipeline with a Really
> Big Hammer.  Can we do better?
> 
> The answer may be "yes, but don't hold your breath."  Then we may need
> means to better cope with the bugs.

Yeh there are some places that the migraiton code holds the BQL where
I don't really understand all the things it's guarding against.

> >> > any way to unblock it since you can't do anything on the monitors.
> >> > If I could issue a command then I could have it end up doing a 
> >> > shutdown(2)
> >> > on the network connection and free things up.
> >> >
> >> > Maybe this would also help real-time people?
> >> >
> >> > I was thinking then, we could:
> >> >    a) Have a monitor that wasn't tied to the main loop at all - each 
> >> > instance
> >> > would be it's own thread and have it's own poll() (probably using the new
> >> > IO channels?)
> >> >    b) for most commands it would take the big lock before execute the 
> >> > command
> >> > and release it afterwards - so there would be no risk in those commands.
> >> 
> >> Saves you the auditing their code, which would probably be impractical.
> >> 
> >> >    c) Some commands you could mark as 'lock free' - they would be 
> >> > required
> >> > not to take any locks or wait for *anything* and for those the monitor 
> >> > would
> >> > not take the lock.
> >> 
> >> They also must not access shared state without suitable synchronization.
> >
> > Right; I'd expect most of the cases to either be reading simple status 
> > information,
> > or having to find whatever they need to do using rcu list walks.
> >
> >> >    d) Perhaps have some way of restricting a particular monitor session 
> >> > to only
> >> > allowing non-blocking commands.
> >> 
> >> Why?  To ensure you always have an emergency monitor available?
> >
> > Yes, mostly to stop screwups of putting a potentially blocking command down 
> > your
> > emergency route.
> 
> Understand.
> 
> >> >    e) Then I'd have to have a way of finding the broken device in a 
> >> > lockfree
> >> > way (RTU list walk?) and issuing the shutdown(2).
> >> >
> >> > Bonus points to anyone who can statically check commands in (c).
> >> 
> >> Tall order.
> >
> > Yes :-)   A (compile time selected) dynamic check might be possible
> > where the normal global lock functions and rcu block functions check if 
> > they're
> > in a monitor thread and assert.
> >
> >> > Does this make sense to everyone else, or does anyone have any better
> >> > suggestions?
> >> 
> >> Sounds like a big, hairy task to me.
> >> 
> >> Any alternatives?
> >
> > I hope so; although is the idea of making a lock-free monitor a generally
> > good idea we should do anyway?
> 
> There's no shortage of good ideas, but our bandwidth to implement,
> review, document and test them has limits.
> 
> The general plan is to reduce the scope of the BQL gradually.
> Liberating the monitor from the BQL is one step on the road to complete
> BQL elimination.  The question is whether this step needs to be taken
> *now*.

COLO probably needs something;  although in it's case I'm going to investigate
if some evil with iptables might be able to force a recovery by causing the 
socket
to close (after all we're assuming that case involves the destination is dead - 
but
I don't want to wait for a full TCP timeout).

Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]