[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus
From: |
Eduardo Habkost |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled |
Date: |
Mon, 29 May 2017 10:22:41 -0300 |
User-agent: |
Mutt/1.8.0 (2017-02-23) |
On Mon, May 29, 2017 at 01:45:36PM +0200, Igor Mammedov wrote:
> On Fri, 26 May 2017 13:06:30 -0300
> Eduardo Habkost <address@hidden> wrote:
>
> > On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> > > It fixes/add missing _PXM object for non mapped CPU (x86)
> > > and missing fdt node (virt-arm).
> > >
> > > It ensures that possible_cpus contains complete mapping if
> > > numa is enabled by the time machine_init() is executed.
> > >
> > > As result non completely mapped CPUs:
> > > 1) appear in ACPI/fdt blobs
> > > 2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
> > > 3) allows to drop checks for has_node_id in numa only code,
> > > reducing number of invariants incomplete mapping could produce
> > > 4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
> > > (when CPU object is created) to machine_numa_finish_init() which
> > > helps to fix [1, 2] and make possible_cpus complete source
> > > of numa mapping available even before CPUs are created.
> > >
> > > Signed-off-by: Igor Mammedov <address@hidden>
> >
> > Code that will be affected, but will keep the same behavior:
> >
> > * pre_plug handler when node-id is omitted on device_add: it
> > already sets node_id=0 if !has_node_id
> > * However, pre_plug behavior is being changed when node-id is
> > set. See below.
> > * build_srat() (both x86 and arm): already sets node_id = 0 if
> > !has_node_id
> > * bochs_bios_init(): already uses g_new0(), so already had
> > omitted entries set to 0.
> >
> > Affected code that will change behavior with this patch:
> >
> > * pre_plug handler when node-id is set:
> > * See my reply to patch 1/5. This patch restores the old
> > behavior that rejected node-id != 0 on omitted CPUs.
> I'll look into it.
>
> > * ACPI _PXM initialization
> > * arm/virt.c: numa-node-id will now be set for all CPUs
> > * QMP/HMP query-hotpluggable-cpus output
> > * Desirable change, as now the QMP command will reflect what
> > the guest is really seeing
> >
> > Due to the _PXM and FDT changes, I have the same objection I have
> > for patch 4/5: if we are going to break user expectations when
> > using incomplete NUMA mappings, let's do that only once (when we
> > start rejecting incomplete NUMA mappings).
>
> FDT/PXM change is bugfix to firmware that I insist on.
> Firmware should properly describe all CPUs instead of repeating
> partial CPU mapping nonsense.
>
> Wrt _PXM change see
> (27111931 acpi: provide _PXM method for CPU devices if QEMU is started numa
> enabled)
> provided we keep 0 node fallback by dropping 4/5
> linux guest will behave the same as without _PXM.
If it is a bug fix, then I agree with it. Also, this doesn't
change the topology of the virtual hardware, so it shouldn't
break user expectations anyway.
>
> And again there wasn't any compat code around that
> and guest saw node-id change when it's been restarted.
>
> Wrt FDT change, it's the same and I also talked with
> Drew about it, his reply was that for ARM target
> they don't care much about NUMA as not all pieces are
> there to make pinning work on host side yet.
>
> PS:
> There wasn't any promise made to keep firmware side
> bug compatible or versioned so users should not have
> expectations about it in the first place.
You are right about this.
>
> Pls don't push for bug compat stuff in the firmware
> as it will rise false expectations (at least until
> we decide to care about it by default).
I won't. Patch 4/5 is different because it makes firmware
describe different hardware. But this one is firmware-only, so
you are right.
>
> (I'd use firmware compat stuff only in cases where
> it breaks guest horribly, i.e. crash/fail boot)
>
>
> > A few extra comments about the has_node_id checks below:
> >
> > > ---
> > > hw/arm/virt-acpi-build.c | 4 +---
> > > hw/core/machine.c | 16 ++++++++++------
> > > hw/i386/acpi-build.c | 3 +--
> > > hw/i386/pc.c | 4 +---
> > > numa.c | 7 +------
> > > 5 files changed, 14 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index e585206..977a794 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker,
> > > VirtMachineState *vms)
> > > srat->reserved1 = cpu_to_le32(1);
> > >
> > > for (i = 0; i < cpu_list->len; ++i) {
> > > - int node_id = cpu_list->cpus[i].props.has_node_id ?
> > > - cpu_list->cpus[i].props.node_id : 0;
> > > core = acpi_data_push(table_data, sizeof(*core));
> > > core->type = ACPI_SRAT_PROCESSOR_GICC;
> > > core->length = sizeof(*core);
> > > - core->proximity = cpu_to_le32(node_id);
> > > + core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
> > > core->acpi_processor_uid = cpu_to_le32(i);
> > > core->flags = cpu_to_le32(1);
> > > }
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index aaf3cff..964b75d 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState
> > > *machine)
> > > const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
> > >
> > > if (!cpu_slot->props.has_node_id) {
> > > - if (default_mapping) {
> > > - /* fetch default mapping from board and enable it */
> > > - CpuInstanceProperties props = cpu_slot->props;
> > > - props.has_node_id = true;
> > > - machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > > - } else {
> > > + /* fetch default mapping from board and enable it */
> > > + CpuInstanceProperties props = cpu_slot->props;
> > > +
> > > + if (!default_mapping) {
> > > /* record slots with not set mapping,
> > > * TODO: make it hard error in future */
> > > char *cpu_str = cpu_slot_to_string(cpu_slot);
> > > g_string_append_printf(s, "%sCPU %d [%s]",
> > > s->len ? ", " : "", i, cpu_str);
> > > g_free(cpu_str);
> > > +
> > > + /* non mapped cpus used to fallback to node 0 */
> > > + props.node_id = 0;
> > > }
> > > +
> > > + props.has_node_id = true;
> > > + machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > > }
> > > }
> > > if (s->len) {
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index afcadac..873880d 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker,
> > > MachineState *machine)
> > > srat->reserved1 = cpu_to_le32(1);
> > >
> > > for (i = 0; i < apic_ids->len; i++) {
> > > - int node_id = apic_ids->cpus[i].props.has_node_id ?
> > > - apic_ids->cpus[i].props.node_id : 0;
> > > + int node_id = apic_ids->cpus[i].props.node_id;
> >
> > What about an assert(props.has_node_id) here?
> I considered it but dropped idea, as it's mostly useless
> since build_srat() is called only when numa is enabled.
I just suggested it because I would be more confident that the
new code that ensures has_node_id is set for all slots is working
and won't break if we refactor it. But not a big deal.
>
> and sticking asserts in every place just in case doesn't
> seem to me like good idea as it would lead to absurd
> asserts after every line.
>
> Considering the new code behaves the same in case !has_node_id
> I don't see any reason to put assert here, the same goes
> for 'Ditto" comment in the next hunk.
>
> But if you insist I can' add assert().
I won't mind if you decide to not add it.
>
[...]
> > > }
> > > for (i = 0; i < nb_numa_nodes; i++) {
> > > numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> > > diff --git a/numa.c b/numa.c
> > > index e30702e..4ef6dea 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot,
> > > DeviceState *dev, Error **errp)
> > > int node_id = object_property_get_int(OBJECT(dev), "node-id",
> > > &error_abort);
> > >
> > > if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> > > - /* by default CPUState::numa_node was 0 if it wasn't set
> > > explicitly
> > > - * TODO: make it error when incomplete numa mapping support is
> > > removed
> > > - */
> > > - node_id = 0;
> > > -
> > > /* due to bug in libvirt, it doesn't pass node-id from props on
> > > * device_add as expected, so we have to fix it up here */
> > > if (slot->props.has_node_id) {
> > > node_id = slot->props.node_id;
> > > + object_property_set_int(OBJECT(dev), node_id, "node-id",
> > > errp);
> > > }
> > > - object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
> >
> > Why not removing the has_node_id check completely, here?
> because numa_cpu_pre_plug() is called unconditionally
> As alternative, I can replace has_node_id check with assert
> and do following from callers:
>
> if (nb_numa_nodes) {
> numa_cpu_pre_plug()
> }
>
>
> > In case this patch get respun, I suggest also removing the
> > has_node_id check at: build_cpus_aml() _PXM code.
> that would be wrong, build_cpus_aml() is called for both
> numa and non numa use-cases, there shouldn't be _PXM
> object if numa is not enabled, hence the check.
>
>
Oh, I forgot about the non-NUMA cases. You're right.
--
Eduardo
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr, (continued)
[Qemu-ppc] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled, Igor Mammedov, 2017/05/23
[Qemu-ppc] [PATCH v2 2/5] numa: move default mapping init to machine, Igor Mammedov, 2017/05/23
[Qemu-ppc] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes, Igor Mammedov, 2017/05/23