[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 2/2] spapr: Add a new level of NUMA for GPUs
From: |
Greg Kurz |
Subject: |
Re: [PATCH v2 2/2] spapr: Add a new level of NUMA for GPUs |
Date: |
Thu, 21 May 2020 11:00:28 +0200 |
On Thu, 21 May 2020 15:13:45 +1000
David Gibson <address@hidden> wrote:
> On Thu, May 21, 2020 at 01:36:16AM +0200, Greg Kurz wrote:
> > On Mon, 18 May 2020 16:44:18 -0500
> > Reza Arbab <address@hidden> wrote:
> >
> > > NUMA nodes corresponding to GPU memory currently have the same
> > > affinity/distance as normal memory nodes. Add a third NUMA associativity
> > > reference point enabling us to give GPU nodes more distance.
> > >
> > > This is guest visible information, which shouldn't change under a
> > > running guest across migration between different qemu versions, so make
> > > the change effective only in new (pseries > 5.0) machine types.
> > >
> > > Before, `numactl -H` output in a guest with 4 GPUs (nodes 2-5):
> > >
> > > node distances:
> > > node 0 1 2 3 4 5
> > > 0: 10 40 40 40 40 40
> > > 1: 40 10 40 40 40 40
> > > 2: 40 40 10 40 40 40
> > > 3: 40 40 40 10 40 40
> > > 4: 40 40 40 40 10 40
> > > 5: 40 40 40 40 40 10
> > >
> > > After:
> > >
> > > node distances:
> > > node 0 1 2 3 4 5
> > > 0: 10 40 80 80 80 80
> > > 1: 40 10 80 80 80 80
> > > 2: 80 80 10 80 80 80
> > > 3: 80 80 80 10 80 80
> > > 4: 80 80 80 80 10 80
> > > 5: 80 80 80 80 80 10
> > >
> > > These are the same distances as on the host, mirroring the change made
> > > to host firmware in skiboot commit f845a648b8cb ("numa/associativity:
> > > Add a new level of NUMA for GPU's").
> > >
> > > Signed-off-by: Reza Arbab <address@hidden>
> > > ---
> > > hw/ppc/spapr.c | 11 +++++++++--
> > > hw/ppc/spapr_pci_nvlink2.c | 2 +-
> > > 2 files changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 88b4a1f17716..1d9193d5ee49 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -893,7 +893,11 @@ static void spapr_dt_rtas(SpaprMachineState *spapr,
> > > void *fdt)
> > > int rtas;
> > > GString *hypertas = g_string_sized_new(256);
> > > GString *qemu_hypertas = g_string_sized_new(256);
> > > - uint32_t refpoints[] = { cpu_to_be32(0x4), cpu_to_be32(0x4) };
> > > + uint32_t refpoints[] = {
> > > + cpu_to_be32(0x4),
> > > + cpu_to_be32(0x4),
> > > + cpu_to_be32(0x2),
> > > + };
> > > uint32_t nr_refpoints;
> > > uint64_t max_device_addr = MACHINE(spapr)->device_memory->base +
> > > memory_region_size(&MACHINE(spapr)->device_memory->mr);
> > > @@ -4544,7 +4548,7 @@ static void spapr_machine_class_init(ObjectClass
> > > *oc, void *data)
> > > smc->linux_pci_probe = true;
> > > smc->smp_threads_vsmt = true;
> > > smc->nr_xirqs = SPAPR_NR_XIRQS;
> > > - smc->nr_assoc_refpoints = 2;
> > > + smc->nr_assoc_refpoints = 3;
> > > xfc->match_nvt = spapr_match_nvt;
> > > }
> > >
> > > @@ -4611,8 +4615,11 @@ DEFINE_SPAPR_MACHINE(5_1, "5.1", true);
> > > */
> > > static void spapr_machine_5_0_class_options(MachineClass *mc)
> > > {
> > > + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> > > +
> > > spapr_machine_5_1_class_options(mc);
> > > compat_props_add(mc->compat_props, hw_compat_5_0, hw_compat_5_0_len);
> > > + smc->nr_assoc_refpoints = 2;
> > > }
> > >
> > > DEFINE_SPAPR_MACHINE(5_0, "5.0", false);
> > > diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
> > > index 8332d5694e46..247fd48731e2 100644
> > > --- a/hw/ppc/spapr_pci_nvlink2.c
> > > +++ b/hw/ppc/spapr_pci_nvlink2.c
> > > @@ -362,7 +362,7 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState
> > > *sphb, void *fdt)
> > > uint32_t associativity[] = {
> > > cpu_to_be32(0x4),
> > > SPAPR_GPU_NUMA_ID,
> > > - SPAPR_GPU_NUMA_ID,
> > > + cpu_to_be32(nvslot->numa_id),
> >
> > This is a guest visible change. It should theoretically be controlled
> > with a compat property of the PHB (look for "static GlobalProperty" in
> > spapr.c). But since this code is only used for GPU passthrough and we
> > don't support migration of such devices, I guess it's okay. Maybe just
> > mention it in the changelog.
>
> Yeah, we might get away with it, but it should be too hard to get this
I guess you mean "it shouldn't be too hard" ?
> right, so let's do it.
>
> >
> > > SPAPR_GPU_NUMA_ID,
> > > cpu_to_be32(nvslot->numa_id)
> > > };
> >
>
pgpwrGak1BxSZ.pgp
Description: OpenPGP digital signature