qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects


From: Vasilis Liaskovitis
Subject: Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects
Date: Fri, 8 Nov 2013 12:22:12 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

On Thu, Nov 07, 2013 at 03:03:42PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote:
> > This patch adds a _PXM method to ACPI CPU objects for the pc machine. The 
> > _PXM
> > value is derived from the passed in guest info, same way as CPU SRAT 
> > entries.
> > 
> > The motivation for this patch is a CPU hot-unplug/hot-plug bug observed when
> > using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM. The linux
> > guest kernel parses the SRAT CPU entries at boot time and stores them in the
> > array __apicid_to_node. When a CPU is hot-removed, the linux guest kernel
> > resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (kernel 
> > commit
> > c4c60524). When the removed cpu is hot-added again, the linux kernel looks 
> > up
> > the hot-added cpu object's _PXM method instead of somehow re-discovering the
> > SRAT entry info. With current qemu/seabios, the _PXM method is not found, 
> > and
> > the CPU is thus hot-plugged in the default NUMA node 0. (The problem does 
> > not
> > show up on initial hotplug of a cpu; the PXM method is still not found in 
> > this
> > case, but the kernel still has the correct proximity value from the CPU's 
> > SRAT
> > entry stored in __apicid_to_node)
> > 
> > ACPI spec mentions that the _PXM method is the correct way to determine
> > proximity information at hot-add time.
> 
> Where does it say this?
> I found this:
> If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynamically
> added processor is not present in the System Resource Affinity Table
> (SRAT), a _PXM object must exist for the processor’s device or one of
> its ancestors in the ACPI Namespace.
> 
> Does this mean that linux is buggy, and should be fixed up to look up
> the apic ID in SRAT?

The quote above suggests that if SRAT is absent, _PXM should be present.
Seabios/qemu provide SRAT entries, and  no _PXM. The fact that the kernel
resets the parse SRAT info on hot-remove time looks like a kernel problem.

But As Toshi Kani mentioned in the original thread, here is a quote from ACPI
5.0, stating _PXM and only _PXM should be used at hot-plug time:

===
17.2.1 System Resource Affinity Table Definition

This optional System Resource Affinity Table (SRAT) provides the boot
time description of the processor and memory ranges belonging to a
system locality. OSPM will consume the SRAT only at boot time. OSPM
should use _PXM for any devices that are hot-added into the system after
boot up.
====

So in this sense, the kernel is correct (kernel only uses _PXM at hot-plug time)
, and qemu/Seabios should have _PXM methods for hot operations.

> 
> > So far, qemu/seabios do not provide this
> > method for CPUs. So regardless of kernel behaviour, it is a good idea to add
> > this _PXM method. Since ACPI table generation has recently been moved from
> > seabios to qemu, we do this in qemu.
> > 
> > Note that the above hot-remove/hot-add scenario has been tested on an older
> > qemu + non-upstreamed patches for cpu hot-removal support, and not on qemu
> > master (since cpu-del support is still not on master). The only testing done
> > with qemu/seabios master and this patch, are successful boots of multi-node
> > linux and windows8 guests.
> > 
> > For the initial discussion on seabios and linux-acpi lists see
> > http://www.spinics.net/lists/linux-acpi/msg47058.html
> > 
> > Signed-off-by: Vasilis Liaskovitis <address@hidden>
> > Reviewed-by: Thilo Fromm <address@hidden>
> 
> Even if this is a linux bug, I have no issue with working around
> it in qemu.
> 
> But I think proper testing needs to be done with rebased upport for cpu-del.

Ok, I can try to rebase cpu-del support for testing. If there are cpu-del bits
already somewhere (Igor?) and not merged yet, please point me to them.

> 
> > ---
> >  hw/i386/acpi-build.c  |    2 ++
> >  hw/i386/ssdt-proc.dsl |    2 ++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 6cfa044..9373f5e 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val)
> >  #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_start + 2)
> >  #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_start + 4)
> >  #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_start)
> > +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_start)
> >  #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start)
> >  #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start)
> >  
> > @@ -724,6 +725,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> >              proc[ACPI_PROC_OFFSET_CPUHEX+1] = acpi_get_hex(i);
> >              proc[ACPI_PROC_OFFSET_CPUID1] = i;
> >              proc[ACPI_PROC_OFFSET_CPUID2] = i;
> > +            proc[ACPI_PROC_OFFSET_CPUPXM] = guest_info->node_cpu[i];
> >          }
> >  
> >          /* build this code:
> > diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl
> > index 8229bfd..7eef8b2 100644
> > --- a/hw/i386/ssdt-proc.dsl
> > +++ b/hw/i386/ssdt-proc.dsl
> > @@ -47,6 +47,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01, "BXPC", 
> > "BXSSDT", 0x1)
> >   * also updating the C code.
> >   */
> >          Name(_HID, "ACPI0007")
> > +        ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_pxm
> > +        Name(_PXM, 0xAA)
> 
> The ACPI spec says this should be a DWORD value:
> 
> Return Value:
> An Integer (DWORD) containing a proximity domain identifier.

ok, I 'll change this.

thanks,

- Vasilis



reply via email to

[Prev in Thread] Current Thread [Next in Thread]