qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] lock-free monitor?


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] lock-free monitor?
Date: Wed, 10 Feb 2016 08:52:05 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

* Markus Armbruster (address@hidden) wrote:
> "Dr. David Alan Gilbert" <address@hidden> writes:
> 
> > Hi,
> >   I wondered what it would take to be able to do a lock-free monitor;
> > i.e. one that could respond to (some) commands while the qemu big lock is 
> > held.
> 
> Requires a careful audit of the monitor code.
> 
> For instance, cur_mon needs to be made thread local for running multiple
> monitors concurrently.

Hmm that's fun;  and cur_mon is used all over the place when 'fd' passing
is used - although that probably should be cleaned up somehow.

> > My reason for asking is that there are cases in migration and colo
> > where a network block device has failed and is blocking, but it fails
> > at just the wrong time while the lock is held, and now, you don't have
> 
> Is it okay to block while holding the BQL?

I'd hope the simple answer was no; unfortunately the more complex answer
is that we keep finding places that do.  The cases I'm aware of are
mostly bdrv_drain_all calls in migration/colo, where one of the block
devices (e.g. an NBD network device) fails.  Generally these are called
at the end of a migration cycle when we're just ensuring that everything
really is drained and are normally called with the lock held to ensure
that nothing else is generating new traffic as we're clearing everything else.

> > any way to unblock it since you can't do anything on the monitors.
> > If I could issue a command then I could have it end up doing a shutdown(2)
> > on the network connection and free things up.
> >
> > Maybe this would also help real-time people?
> >
> > I was thinking then, we could:
> >    a) Have a monitor that wasn't tied to the main loop at all - each 
> > instance
> > would be it's own thread and have it's own poll() (probably using the new
> > IO channels?)
> >    b) for most commands it would take the big lock before execute the 
> > command
> > and release it afterwards - so there would be no risk in those commands.
> 
> Saves you the auditing their code, which would probably be impractical.
> 
> >    c) Some commands you could mark as 'lock free' - they would be required
> > not to take any locks or wait for *anything* and for those the monitor would
> > not take the lock.
> 
> They also must not access shared state without suitable synchronization.

Right; I'd expect most of the cases to either be reading simple status 
information,
or having to find whatever they need to do using rcu list walks.

> >    d) Perhaps have some way of restricting a particular monitor session to 
> > only
> > allowing non-blocking commands.
> 
> Why?  To ensure you always have an emergency monitor available?

Yes, mostly to stop screwups of putting a potentially blocking command down your
emergency route.

> >    e) Then I'd have to have a way of finding the broken device in a lockfree
> > way (RTU list walk?) and issuing the shutdown(2).
> >
> > Bonus points to anyone who can statically check commands in (c).
> 
> Tall order.

Yes :-)   A (compile time selected) dynamic check might be possible
where the normal global lock functions and rcu block functions check if they're
in a monitor thread and assert.

> > Does this make sense to everyone else, or does anyone have any better
> > suggestions?
> 
> Sounds like a big, hairy task to me.
> 
> Any alternatives?

I hope so; although is the idea of making a lock-free monitor a generally
good idea we should do anyway?

Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]