qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity struc


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM
Date: Mon, 26 Feb 2018 15:08:35 +0100

On Wed, 21 Feb 2018 06:51:11 -0800
Dan Williams <address@hidden> wrote:

> On Wed, Feb 21, 2018 at 5:55 AM, Igor Mammedov <address@hidden> wrote:
> > On Tue, 20 Feb 2018 17:17:58 -0800
> > Dan Williams <address@hidden> wrote:
> >  
> >> On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov <address@hidden> wrote:  
> >> > On Sat, 17 Feb 2018 14:31:35 +0800
> >> > Haozhong Zhang <address@hidden> wrote:
> >> >  
> >> >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> >> >> domain of a NVDIMM SPA range must match with corresponding entry in
> >> >> SRAT table.
> >> >>
> >> >> The address ranges of vNVDIMM in QEMU are allocated from the
> >> >> hot-pluggable address space, which is entirely covered by one SRAT
> >> >> memory affinity structure. However, users can set the vNVDIMM
> >> >> proximity domain in NFIT SPA range structure by the 'node' property of
> >> >> '-device nvdimm' to a value different than the one in the above SRAT
> >> >> memory affinity structure.
> >> >>
> >> >> In order to solve such proximity domain mismatch, this patch build one
> >> >> SRAT memory affinity structure for each NVDIMM device with the
> >> >> proximity domain used in NFIT. The remaining hot-pluggable address
> >> >> space is covered by one or multiple SRAT memory affinity structures
> >> >> with the proximity domain of the last node as before.
> >> >>
> >> >> Signed-off-by: Haozhong Zhang <address@hidden>  
> >> > If we consider hotpluggable system, correctly implemented OS should
> >> > be able pull proximity from Device::_PXM and override any value from 
> >> > SRAT.
> >> > Do we really have a problem here (anything that breaks if we would use 
> >> > _PXM)?
> >> > Maybe we should add _PXM object to nvdimm device nodes instead of 
> >> > massaging SRAT?  
> >>
> >> Unfortunately _PXM is an awkward fit. Currently the proximity domain
> >> is attached to the SPA range structure. The SPA range may be
> >> associated with multiple DIMM devices and those individual NVDIMMs may
> >> have conflicting _PXM properties.  
> > There shouldn't be any conflict here as  NVDIMM device's _PXM method,
> > should override in runtime any proximity specified by parent scope.
> > (as parent scope I'd also count boot time NFIT/SRAT tables).
> >
> > To make it more clear we could clear valid proximity domain flag in SPA
> > like this:
> >
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index 59d6e42..131bca5 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, 
> > DeviceState *dev)
> >       */
> >      nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> >                                         management during hot add/online
> > -                                       operation */ |
> > -                                  2 /* Data in Proximity Domain field is
> > -                                       valid*/);
> > +                                       operation */);
> >
> >      /* NUMA node. */
> >      nfit_spa->proximity_domain = cpu_to_le32(node);
> >  
> >> Even if that was unified across
> >> DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> >> device's control interface, or the assembled persistent memory SPA
> >> range.  
> > I'm not sure what you mean under 'device's control interface',
> > could you clarify where the ambiguity comes from?  
> 
> There are multiple SPA range types. In addition to the typical
> Persistent Memory SPA range there are also Control Region SPA ranges
> for MMIO registers on the DIMM for Block Apertures and other purposes.
> 
> >
> > I read spec as: _PXM applies to address range covered by NVDIMM
> > device it belongs to.  
> 
> No, an NVDIMM may contribute to multiple SPA ranges and those ranges
> may span sockets.
Isn't NVDIMM device plugged into a single socket which belongs to
a single numa node?
If it's so then shouldn't SPAs referencing it also have the same
proximity domain?


> > As for assembled SPA, I'd assume that it applies to interleaved set
> > and all NVDIMMs with it should be on the same node. It's somewhat
> > irrelevant question though as QEMU so far implements only
> >   1:1:1/SPA:Region Mapping:NVDIMM Device/
> > mapping.
> >
> > My main concern with using static configuration tables for proximity
> > mapping, we'd miss on hotplug side of equation. However if we start
> > from dynamic side first, we could later complement it with static
> > tables if there really were need for it.  
> 
> Especially when you consider the new HMAT table that wants to have
> proximity domains for describing performance characteristics of an
> address range relative to an initiator, the _PXM method on an
> individual NVDIMM device is a poor fit for describing a wider set.
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]