qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date: Fri, 8 Sep 2017 14:19:24 +0100

On Fri, Sep 8, 2017 at 12:49 PM, Markus Armbruster <address@hidden> wrote:
> "Dr. David Alan Gilbert" <address@hidden> writes:
>
>> * Markus Armbruster (address@hidden) wrote:
>>> "Dr. David Alan Gilbert" <address@hidden> writes:
>>>
>>> > * Markus Armbruster (address@hidden) wrote:
>>> >> "Daniel P. Berrange" <address@hidden> writes:
>>> >>
>>> >> > On Thu, Sep 07, 2017 at 02:59:28PM +0200, Markus Armbruster wrote:
>>> If we can't eliminate main loop hangs, any ideas on reducing their
>>> impact?
>>
>> Note there's two related things; main loop hangs and bql hangs; I'm not
>> sure that the two are always the same.
>>
>> Stefan mentioned some ways of doing asynchronous memory lookups/accesses
>> though I'm not sure they'd work in the postcopy case; but they'd need
>> work in lots of devices.
>> Some of the IO under the BQL might be fixable; IMHO in a lot of places
>> we don't really need the full BQL, we just need a 'you aren't going to
>> change the config' lock.
>
> This is all about reducing main loop hangs.  Another one is moving
> "slow" code out of the main loop, e.g. monitor commands.
>
> My question was aiming in a slightly different direction, however: given
> that the main loop can hang, is there anything we can do to mitigate
> known bad consequences of such hangs?

I don't think we can mitigate it completely but we can make it visible
and easier to study.

There were discussions about making the event loop observable in the
past.  In other words, logging which handler functions are firing.
That way you can debug scenarios where the loop is spinning
("main-loop: WARNING: I/O thread spun for 1000 iterations\n") and also
latency.  Collecting event handler latencies and looking at the
histogram would be interesting.  The outliers (e.g. 250+ microseconds)
are things that we should know about and consider refactoring.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]