[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 0/1] ACPI: Fix missing CPU hotplug/hotunplug events with
From: |
Igor Mammedov |
Subject: |
Re: [RFC PATCH 0/1] ACPI: Fix missing CPU hotplug/hotunplug events with > 255 vCPUs |
Date: |
Mon, 9 Dec 2024 15:36:06 +0100 |
On Tue, 3 Dec 2024 16:56:35 -0800
Eric Mackay <eric.mackay@oracle.com> wrote:
> ACPI hotplug with 255 or less vCPUs can use the legacy CPU hotplug interface,
> which does
> not support hotunplug. If it's available, hotunplug will use the modern CPU
> hotplug interface.
> This creates a situation where hotplug and hotunplug are using different
> interfaces, but
> the end result is still functional. CPUs can be hotplugged and hotunplugged
> at will.
only one kind of interface should be in use. And for quite a while firmware
shipped
with QEMU is using QEMU provided ACPI tables, which in turn switch interface to
modern hotplug when guest OS loads ACPI tables, see:
if (opts.has_legacy_cphp) {
method = aml_method("_INI", 0, AML_SERIALIZED);
or firmware does the switch (OVMF usecase) before guest OS.
if any hotplug action happened before guest switched hotplug favour,
it will trigger SCI, which guest OS should process it (incl. switching
to modern interface and sending necessary device check/remove events)
once ACPI tables are processed.
> With > 255 vCPUs, both hotplug and hotunplug will use the modern ACPI
> interface.
> There is no priority or rules of mutual exclusion defined in this interface,
> and the behavior in the guest is implementation-defined.
>
> Unfortunately, it is possible to have both a hotplug and hotunplug event
> pending
> for the same vCPU. When the guest processes its pending events, it may see the
> hotplug but ignore the hotunplug.
>
> The most recent event is likely to reflect the desired state of the system, so
> ignoring the hotunplug event in this scenario is unacceptable.
QEMU has delivered both events as it was intended (and in order plug then
unplug).
I'd say that ignoring events is a guest bug.
With unplug, potentially if fixing guest is not an option
one can try retry device_del to trigger unplug event again.
PS:
I vaguely recall linux kernel wiping GPE state on boot during
initialization, but that is guest problem and should be fixed
on guest side.
(unexpectedly enough, Windows handles GPE state as expected)
> Repro: This can be seen in practice when a q35 VM is started in qemu, with
> maxcpus=260 or something above 255, and a few statically allocated CPUs.
> Example smp line: "-smp 4,sockets=2,dies=1,cores=65,threads=2,maxcpus=260"
> Then use libvirt to add vCPUs up to the maximum before continuing the VM.
> After
> the guest starts up, delete one of the vCPUs. 'lscpu' in the guest should
> still
> show 260.
above should work especially with unplug event happening after guest OS has
booted.
Lets simplify reproducer to QEMU only (i.e. getting rid of libvirt part) and see
where it leads us.
> The proposed solution is to enforce mutual exclusion between the hotplug and
> hotunplug
> bits in the modern ACPI interface. Setting a new pending event will clear a
> previously
> pending event of the opposite type, thus preserving only the most recently
> requested
> state.
Once VM is in running state, clearing flags from QEMU side is not safe.
Effectively it would introduce a race with guest code.
Also I'm not sure that flags are the issue here,
Capturing some CPUHP traces on QEMU side and logs from guest side
might help to clarify what's going on.
> Eric Mackay (1):
> ACPI: Enforce mutual exclusion betwen CPU insertion and removal events
>
> hw/acpi/cpu.c | 36 ++++++++++++++++++++++++++++++++----
> include/hw/acpi/cpu.h | 4 ++++
> 2 files changed, 36 insertions(+), 4 deletions(-)
>