qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread


From: Markus Armbruster
Subject: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date: Wed, 30 Aug 2017 09:06:20 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

"Daniel P. Berrange" <address@hidden> writes:

> On Wed, Aug 23, 2017 at 02:51:03PM +0800, Peter Xu wrote:
>> v2:
>> - fixed "make check" error that patchew reported
>> - moved the thread_join upper in monitor_data_destroy(), before
>>   resources are released
>> - added one new patch (current patch 3) that fixes a nasty risk
>>   condition with IOWatchPoll.  Please see commit message for more
>>   information.
>> - added a g_main_context_wakeup() to make sure the separate loop
>>   thread can be kicked always when we want to destroy the per-monitor
>>   threads.
>> - added one new patch (current patch 8) to introduce migration mgmt
>>   lock for migrate_incoming.
>> 
>> This is an extended work for migration postcopy recovery. This series
>> is tested with the following series to make sure it solves the monitor
>> hang problem that we have encountered for postcopy recovery:
>> 
>>   [RFC 00/29] Migration: postcopy failure recovery
>>   [RFC 0/6] migration: re-use migrate_incoming for postcopy recovery
>> 
>> The root problem is that, monitor commands are all handled in main
>> loop thread now, no matter how many monitors we specify. And, if main
>> loop thread hangs due to some reason, all monitors will be stuck.
>> This can be done in reversed order as well: if any of the monitor
>> hangs, it will hang the main loop, and the rest of the monitors (if
>> there is any).

Yes.

>> That affects postcopy recovery, since the recovery requires user input
>> on destination side.  If monitors hang, the destination VM dies and
>> lose hope for even a final recovery.
>> 
>> So, sometimes we need to make sure the monitor be alive, at least one
>> of them.
>> 
>> The whole idea of this series is that instead if handling monitor
>> commands all in main loop thread, we do it separately in per-monitor
>> threads.  Then, even if main loop thread hangs at any point by any
>> reason, per-monitor thread can still survive.

This takes care of "monitor hangs because other parts of the main loop
(including other monitors) hang".  It doesn't take care of "monitor
hangs because the current monitor command hangs".

>>                                                Further, we add hint in
>> QMP/HMP to show whether a command can be executed without QMP, if so,
>> we avoid taking BQL when running that command.  It greatly reduced
>> contention of BQL.  Now the only user of that new parameter (currently
>> I call it "without-bql") is "migrate-incoming" command, which is the
>> only command to rescue a paused postcopy migration.

This takes care of one way commands can hang.  There are other ways;
NFS server going AWOL is a classic.  I don't know whether any other way
applies to migrate-incoming.

>> However, even with the series, it does not mean that per-monitor
>> threads will never hang.  One example is that we can still run "info
>> vcpus" in per-monitor threads during a paused postcopy (in that state,
>> page faults are never handled, and "info cpus" will never return since
>> it tries to sync every vcpus).  So to make sure it does not hang, we
>> not only need the per-monitor thread, the user should be careful as
>> well on how to use it.
>> 
>> For postcopy recovery, we may need dedicated monitor channel for
>> recovery.  In other words, a destination VM that supports postcopy
>> recovery would possibly need:
>> 
>>   -qmp MAIN_CHANNEL -qmp RECOVERY_CHANNEL

Where RECOVERY_CHANNEL isn't necessarily just for postcopy, but for any
"emergency" QMP access.  If you use it only for commands that cannot
hang (i.e. terminate in bounded time), then you'll always be able to get
commands accepted there in bounded time.

> I think this is a really horrible thing to expose to management applications.
> They should not need to be aware of fact that QEMU is buggy and thus requires
> that certain commands be run on different monitors to work around the bug.

These are (serious) design limitations, not bugs in the narrow sense of
the word.

However, I quite agree that the need for clients to know whether a
monitor command can hang is impractical for the general case.  What
might be practical is a QMP monitor mode that accepts only known
hang-free commands.  Hang-free could be introspectable.

In case you consider that ugly: it's best to explore the design space
first, and recoil from "ugly" second.

> I'd much prefer to see the problem described handled transparently inside
> QEMU. One approach is have a dedicated thread in QEMU responsible for all
> monitor I/O. This thread should never actually execute monitor commands
> though, it would simply parse the command request and put data onto a queue
> of pending commands, thus it could never hang. The command queue could be
> processed by the main thread, or by another thread that is interested.
> eg the migration thread could process any queued commands related to
> migration directly.

The monitor itself can't hang then, but the thread(s) dequeuing parsed
commands can.

To maintain commands' synchronous semantics, their replies need to be
sent in order, which of course reintroduces the hangs.

Let's take a step back from the implementation, and talk about
*behavior* instead.

You prefer to have "the problem described handled transparently inside
QEMU".  I read that as "QEMU must ensure the QMP monitor is available at
all times".  "Available" means it accepts commands in bounded time.
Some commands will always finish in bounded time once accepted, others
may not, and whether they do may depend on the commands currently in
flight.

Commands that can always start and always terminate in bounded time are
no problem.

All the other commands have to become "job-starting": the QMP command
kicks off a "job", which runs concurrently with the QMP monitor for some
(possibly unbounded) time, then finishes.  Jobs can be examined (say to
monitor progress, if the job supports that) and controlled (say to
cancel, if the job supports that).

A few commands are already job-starting: migrate, the block job family,
dump-guest-memory with detach=true.  Whether they're already hang-free I
can't say; they could do risky work in their synchronous part.

Many commands that can hang are not job-starting.

Changing a command from "do the job" to "merely start the job" is a
compatibility break.

We could make the change opt-in to preserve compatibility.  But is
preserving a compatible QMP monitor that is prone to hang wortwhile?

If no, we may choose to use the resulting compatibility break to also
switch the packaging of jobs from the current "synchronous command +
broadcast message when done" to some variation of asynchronous command.
But that should be discussed in a separate thread, and only after we
know how we plan to ensure monitor availability.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]