qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] lock-free monitor?


From: Markus Armbruster
Subject: Re: [Qemu-devel] lock-free monitor?
Date: Thu, 11 Feb 2016 09:33:11 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

"Dr. David Alan Gilbert" <address@hidden> writes:

> * Markus Armbruster (address@hidden) wrote:
>> "Dr. David Alan Gilbert" <address@hidden> writes:
>> 
>> > * Markus Armbruster (address@hidden) wrote:
>> >> "Dr. David Alan Gilbert" <address@hidden> writes:
>> >> 
>> >> > Hi,
>> >> >   I wondered what it would take to be able to do a lock-free monitor;
>> >> > i.e. one that could respond to (some) commands while the qemu big lock 
>> >> > is held.
>> >> 
>> >> Requires a careful audit of the monitor code.
>> >> 
>> >> For instance, cur_mon needs to be made thread local for running multiple
>> >> monitors concurrently.
>> >
>> > Hmm that's fun;  and cur_mon is used all over the place when 'fd' passing
>> > is used - although that probably should be cleaned up somehow.
>> 
>> When cur_mon got created, we had no threads.  Even now, monitors only
>> ever run under the BQL, so cur_mon's still fine.
>> 
>> Having a current monitor is simpler than passing one around, especially
>> when you go through layers that don't know about it.  Such cases exist,
>> and they made us abandon commit 376253e's hope to eliminate cur_mon.
>> Unfortunately, I've since forgotten the details, so I can't point you to
>> an example.
>
> Yes,  I think maybe if we can try and remove the use of cur_mon one
> use at a time outside of the monitor it might move it in the right direction.

For historical reasons, some interfaces assume the current monitor,
while others take Monitor * parameters.  The typical cur_mon use outside
the monitor is such an argument.

Whether such a Monitor * interface actually works when passed something
other than cur_mon is anybody's guess.  Moving monitors out of the BQL
is unlikely to increase my confidence.

If a function can safely work on non-current monitors, a Monitor *
parameter is fine.  If not, it shouldn't take one, and just use cur_mon.

>> >> > My reason for asking is that there are cases in migration and colo
>> >> > where a network block device has failed and is blocking, but it fails
>> >> > at just the wrong time while the lock is held, and now, you don't have
>> >> 
>> >> Is it okay to block while holding the BQL?
>> >
>> > I'd hope the simple answer was no; unfortunately the more complex answer
>> > is that we keep finding places that do.
>> 
>> Would you call these bugs?
>> 
>> Even if you do, we may want to recover from them.
>
> They're a bug in the end result that needs fixing; so for example a 
> crashing NBD server shouldn't lock you out of the monitor during a migrate,
> and I don't think we current;y have other solutions.

It shouldn't lock you out of anything.  The monitor is just one such
thing.  It's special only because it can let the user (human or
management application) recover from certain bugs with monitor commands.

I'd see enabling that as a partial work-around for a class of bugs that
is hard to fix.  Work-around is better than nothing.

However, it can reduce the incentive to actually fix the bug.  No reason
to block work-arounds, but I'd like to see an earnest commitment to
actual bug fixing.

>> >                                          The cases I'm aware of are
>> > mostly bdrv_drain_all calls in migration/colo, where one of the block
>> > devices (e.g. an NBD network device) fails.  Generally these are called
>> > at the end of a migration cycle when we're just ensuring that everything
>> > really is drained and are normally called with the lock held to ensure
>> > that nothing else is generating new traffic as we're clearing everything 
>> > else.
>> 
>> Using the BQL for that drives a cork into the I/O pipeline with a Really
>> Big Hammer.  Can we do better?
>> 
>> The answer may be "yes, but don't hold your breath."  Then we may need
>> means to better cope with the bugs.
>
> Yeh there are some places that the migraiton code holds the BQL where
> I don't really understand all the things it's guarding against.

I guess actual bug fixing would start with figuring that out.

>> >> > any way to unblock it since you can't do anything on the monitors.
>> >> > If I could issue a command then I could have it end up doing a 
>> >> > shutdown(2)
>> >> > on the network connection and free things up.
>> >> >
>> >> > Maybe this would also help real-time people?
>> >> >
>> >> > I was thinking then, we could:
>> >> >    a) Have a monitor that wasn't tied to the main loop at all - each 
>> >> > instance
>> >> > would be it's own thread and have it's own poll() (probably using the 
>> >> > new
>> >> > IO channels?)
>> >> >    b) for most commands it would take the big lock before execute the 
>> >> > command
>> >> > and release it afterwards - so there would be no risk in those commands.
>> >> 
>> >> Saves you the auditing their code, which would probably be impractical.
>> >> 
>> >> >    c) Some commands you could mark as 'lock free' - they would be 
>> >> > required
>> >> > not to take any locks or wait for *anything* and for those the monitor 
>> >> > would
>> >> > not take the lock.
>> >> 
>> >> They also must not access shared state without suitable synchronization.
>> >
>> > Right; I'd expect most of the cases to either be reading simple status 
>> > information,
>> > or having to find whatever they need to do using rcu list walks.
>> >
>> >> >    d) Perhaps have some way of restricting a particular monitor session 
>> >> > to only
>> >> > allowing non-blocking commands.
>> >> 
>> >> Why?  To ensure you always have an emergency monitor available?
>> >
>> > Yes, mostly to stop screwups of putting a potentially blocking command 
>> > down your
>> > emergency route.
>> 
>> Understand.
>> 
>> >> >    e) Then I'd have to have a way of finding the broken device in a 
>> >> > lockfree
>> >> > way (RTU list walk?) and issuing the shutdown(2).
>> >> >
>> >> > Bonus points to anyone who can statically check commands in (c).
>> >> 
>> >> Tall order.
>> >
>> > Yes :-)   A (compile time selected) dynamic check might be possible
>> > where the normal global lock functions and rcu block functions check if 
>> > they're
>> > in a monitor thread and assert.
>> >
>> >> > Does this make sense to everyone else, or does anyone have any better
>> >> > suggestions?
>> >> 
>> >> Sounds like a big, hairy task to me.
>> >> 
>> >> Any alternatives?
>> >
>> > I hope so; although is the idea of making a lock-free monitor a generally
>> > good idea we should do anyway?
>> 
>> There's no shortage of good ideas, but our bandwidth to implement,
>> review, document and test them has limits.
>> 
>> The general plan is to reduce the scope of the BQL gradually.
>> Liberating the monitor from the BQL is one step on the road to complete
>> BQL elimination.  The question is whether this step needs to be taken
>> *now*.
>
> COLO probably needs something;  although in it's case I'm going to investigate
> if some evil with iptables might be able to force a recovery by causing the 
> socket
> to close (after all we're assuming that case involves the destination is dead 
> - but
> I don't want to wait for a full TCP timeout).

As always when feature X is considered sound, but there's doubt whether
it's worth doing *now*: somebody wanting it badly enough to implement it
is pretty compelling evidence it's worth doing for him.  Especially when
said somebody has plenty of other important things to do in QEMU.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]