qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH qemu] spapr_pci: Add numa node id


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH qemu] spapr_pci: Add numa node id
Date: Thu, 22 Sep 2016 14:48:26 +1000
User-agent: Mutt/1.7.0 (2016-08-17)

On Wed, Sep 14, 2016 at 07:39:10PM +1000, Alexey Kardashevskiy wrote:
> On 14/09/16 09:29, Michael Roth wrote:
> > Quoting Alexey Kardashevskiy (2016-07-27 03:03:38)
> >> This adds a numa id property to a PHB to allow linking passed PCI device
> >> to CPU/memory. It is up to the management stack to do CPU/memory pinning
> >> to the node with the actual PCI device.
> > 
> > It looks like x86 relies on PCIBus->numa_node() method (via
> > pci_bus_numa_node()) to encode similar PCIBus affinities
> > into ACPI tables, and currently exposes it via
> > -device pci-[-express]-expander-bus,numa_node=X.
> 
> Well, until we allow DMA windows per PCI bus (not per PHB as it is now),
> this won't make much sense for us (unless I am missing something
> here).

I can't actually see any sane way we could have NUMA associativity of
PCI at any finer granularity than the host bridge level.  I suspect
the only reason it works that way on x86 is due to some of the weird
stuff PC does to make what are effectively different host bridges
appear as different buses in a single logical domain.

> 
> > Maybe we should implement it using this existing
> > PCIBus->numa_node() interface?
> > 
> > We'd still have to expose numa_node as a spapr-pci-host-bridge
> > device option though. Not sure if there's a more common way
> > to expose it that might be easier for libvirt to discover. As it
> > stands we'd need to add spapr-pci-host-bridge to a libvirt
> > whitelist that currently only covers pci-expander-bus.
> > 
> > Cc'ing Shiva who was looking into the libvirt side.

I think we should change the actual name of the option to "numa_node"
to match the option used on the pxb, though.

> > 
> > One comment below:
> > 
> >>
> >> Signed-off-by: Alexey Kardashevskiy <address@hidden>
> >> ---
> >>  hw/ppc/spapr_pci.c          | 13 +++++++++++++
> >>  include/hw/pci-host/spapr.h |  2 ++
> >>  2 files changed, 15 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >> index 949c44f..af5394a 100644
> >> --- a/hw/ppc/spapr_pci.c
> >> +++ b/hw/ppc/spapr_pci.c
> >> @@ -47,6 +47,7 @@
> >>  #include "sysemu/device_tree.h"
> >>  #include "sysemu/kvm.h"
> >>  #include "sysemu/hostmem.h"
> >> +#include "sysemu/numa.h"
> >>
> >>  #include "hw/vfio/vfio.h"
> >>
> >> @@ -1544,6 +1545,7 @@ static Property spapr_phb_properties[] = {
> >>      DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true),
> >>      DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask,
> >>                         (1ULL << 12) | (1ULL << 16)),
> >> +    DEFINE_PROP_UINT32("node", sPAPRPHBState, numa_node, -1),
> >>      DEFINE_PROP_END_OF_LIST(),
> >>  };
> >>
> >> @@ -1805,6 +1807,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >>          cpu_to_be32(1),
> >>          cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW)
> >>      };
> >> +    uint32_t associativity[] = {cpu_to_be32(0x4),
> >> +                                cpu_to_be32(0x0),
> >> +                                cpu_to_be32(0x0),
> >> +                                cpu_to_be32(0x0),
> >> +                                cpu_to_be32(phb->numa_node)};
> >>      sPAPRTCETable *tcet;
> >>      PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
> >>      sPAPRFDT s_fdt;
> >> @@ -1837,6 +1844,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> >>                           &ddw_extensions, sizeof(ddw_extensions)));
> >>      }
> >>
> >> +    /* Advertise NUMA via ibm,associativity */
> >> +    if (nb_numa_nodes > 1) {
> >> +        _FDT(fdt_setprop(fdt, bus_off, "ibm,associativity", associativity,
> >> +                         sizeof(associativity)));
> >> +    }
> > 
> > LoPAPR 15.3 seems to suggest that ibm,associativity-reference-points is
> > required alongside ibm,associativity for each DT node it appears in,
> > and since we hardcode "Form 1" affinity it should be done similarly as
> > the entry we place in the top-level DT node.
> 
> 
> Hm, okay, I'll add it. There is a question to Bharata - why do we have 4s
> in spapr_create_fdt_skel()'s refpoints? Just a random pick?
> 
> vec5[5]==0x80 means we are doing "Form1" and these 4s are far/near distances?
> 
> 
> >> +
> >>      /* Build the interrupt-map, this must matches what is done
> >>       * in pci_spapr_map_irq
> >>       */
> >> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> >> index 5adc603..53c4b2d 100644
> >> --- a/include/hw/pci-host/spapr.h
> >> +++ b/include/hw/pci-host/spapr.h
> >> @@ -75,6 +75,8 @@ struct sPAPRPHBState {
> >>      bool ddw_enabled;
> >>      uint64_t page_size_mask;
> >>      uint64_t dma64_win_addr;
> >> +
> >> +    uint32_t numa_node;
> >>  };
> >>
> >>  #define SPAPR_PCI_MAX_INDEX          255
> >>
> > 
> 
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]