qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to


From: Marc-André Lureau
Subject: Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll
Date: Mon, 28 Aug 2017 12:11:38 +0200

Hi

On Mon, Aug 28, 2017 at 5:05 AM, Peter Xu <address@hidden> wrote:
> On Fri, Aug 25, 2017 at 04:07:34PM +0000, Marc-André Lureau wrote:
>> On Fri, Aug 25, 2017 at 5:33 PM Dr. David Alan Gilbert <address@hidden>
>> wrote:
>>
>> > * Marc-André Lureau (address@hidden) wrote:
>> > > Hi
>> > >
>> > > On Wed, Aug 23, 2017 at 8:52 AM Peter Xu <address@hidden> wrote:
>> > >
>> > > > Firstly, introduce Monitor.use_thread, and set it for monitors that are
>> > > > using non-mux typed backend chardev.  We only do this for monitors, so
>> > > > mux-typed chardevs are not suitable (when it connects to, e.g., serials
>> > > > and the monitor together).
>> > > >
>> > > > When use_thread is set, we create standalone thread to poll the monitor
>> > > > events, isolated from the main loop thread.  Here we still need to take
>> > > > the BQL before dispatching the tasks since some of the monitor commands
>> > > > are not allowed to execute without the protection of BQL.  Then this
>> > > > gives us the chance to avoid taking the BQL for some monitor commands
>> > in
>> > > > the future.
>> > > >
>> > > > * Why this change?
>> > > >
>> > > > We need these per-monitor threads to make sure we can have at least one
>> > > > monitor that will never stuck (that can receive further monitor
>> > > > commands).
>> > > >
>> > > > * So when will monitors stuck?  And, how do they stuck?
>> > > >
>> > > > After we have postcopy and remote page faults, it's simple to achieve a
>> > > > stuck in the monitor (which is also a stuck in main loop thread):
>> > > >
>> > > > (1) Monitor deadlock on BQL
>> > > >
>> > > > As we may know, when postcopy is running on destination VM, the vcpu
>> > > > threads can stuck merely any time as long as it tries to access an
>> > > > uncopied guest page.  Meanwhile, when the stuck happens, it is possible
>> > > > that the vcpu thread is holding the BQL.  If the page fault is not
>> > > > handled quickly, you'll find that monitors stop working, which is
>> > trying
>> > > > to take the BQL.
>> > > >
>> > > > If the page fault cannot be handled correctly (one case is a paused
>> > > > postcopy, when network is temporarily down), monitors will hang
>> > > > forever.  Without current patch, that means the main loop hanged.
>> > We'll
>> > > > never find a way to talk to VM again.
>> > > >
>> > >
>> > > Could the BQL be pushed down to the monitor commands level instead? That
>> > > way we wouldn't need a seperate thread to solve the hang on commands that
>> > > do not need BQL.
>> >
>> > If the main thread is stuck though I don't see how that helps you; you
>> > have to be able to run these commands on another thread.
>> >
>>
>> Why would the main thread be stuck? In (1) If the vcpu thread takes the BQL
>> and the command doesn't need it, it would work.  In (2),  info cpus
>> shouldn't keep the BQL (my qapi-async series would probably help here)
>
> (Thanks for joining the discussion)
>
> AFAIK the main thread can be stuck for many reasons.  I have seen one
> stack when the VGA code (IIUC) was trying to writting to guest graphic
> memory in main loop thread but luckily that guest page is still not
> copied yet from source.  As long as the main thread is stuck for any
> reason, no chance for monitor commands, even if the commands support
> async operations.

If that command becomes async (it probably should, any command doing
IO probaly should), then the main loop can keep running.

>
> So IMHO the only solution is doing these things in separate threads,
> rather than all in a single one.

I wouldn't say it's the only solution. I think the monitor can touch
many areas that haven't been written with multi-threading in mind. My
proposal is probably safer, although I don't know how hard it would be
to push the BQL down to QMP commands, and make async existing IO
commands. The benefits of this work are quite interesting imho,
because a stuck mainloop is basically a stuck qemu, and an additional
thread will not solve it...

-- 
Marc-André Lureau



reply via email to

[Prev in Thread] Current Thread [Next in Thread]