qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/3] spapr: Add NVDIMM device support


From: Shivaprasad G Bhat
Subject: Re: [PATCH v3 2/3] spapr: Add NVDIMM device support
Date: Thu, 12 Dec 2019 14:22:56 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2



On 12/11/2019 01:35 PM, Igor Mammedov wrote:
On Wed, 11 Dec 2019 09:44:11 +0530
Shivaprasad G Bhat <address@hidden> wrote:

On 12/06/2019 07:22 AM, David Gibson wrote:
On Wed, Nov 27, 2019 at 09:50:54AM +0530, Bharata B Rao wrote:
On Fri, Nov 22, 2019 at 10:42 AM David Gibson
<address@hidden> wrote:
Ok.  A number of queries about this.

1) The PAPR spec for ibm,dynamic-memory-v2 says that the first word in
each entry is the number of LMBs, but for NVDIMMs you use the
not-necessarily-equal scm_block_size instead.  Does the NVDIMM
amendment for PAPR really specify to use different block sizes for
these cases?  (In which case that's a really stupid spec decision, but
that wouldn't surprise me at this point).
SCM block sizes can be different from LMB sizes, but here we enforce
that SCM device size (excluding metadata) to multiple of LMB size so
that we don't end up memory range that is not aligned to LMB size.
Right, but it still doesn't make sense to use scm_block_size when you
create the dynamic-memory-v2 property.
Right, I should use LMB size here as I will be creating holes here to
disallow DIMMs
to claim those LMBs marking them INVALID as Bharata Suggested before.

   As far as the thing
interpreting that goes, it *must* be LMB size, not SCM block size.  If
those are required to be the same at this point, you should use an
assert().
SCM block size should be a multiple for LMB size, need not be equal.
I'll add an assert
for that, checking if equal. There is no benefit I see as of now having
higher
SCM block size as the bind/unbind are already done before the bind hcall.

2) Similarly, the ibm,dynamic-memory-v2 description says that the
memory block described by the entry has a whole batch of contiguous
DRCs starting at the DRC index given and continuing for #LMBs DRCs.
For NVDIMMs it appears that you just have one DRC for the whole
NVDIMM.  Is that right?
One NVDIMM has one DRC, In our case, we need to mark the LMBs
corresponding to that address range in ibm,dynamic-memory-v2 as
reserved and invalid.
Ok, that fits very weirdly with the DRC allocation for the rest of
pluggable memory, but I suppose that's PAPR for you.

Having these in together is very inscrutable though, and relies on a
heap of non-obvious constraints about placement of DIMMs and NVDIMMs
relative to each other.  I really wonder if it would be better to have
a completely different address range for the NVDIMMs.
The backend object for both DIMM and NVDIMM are memory-backend-*
and they use the address from the same space. Separating it would mean
using/introducing different backend object. I dont think we have a
choice here.
What address-space(s) are are talking about here exactly?
 From my point of view memory-backend-* provides RAM block at
some HVA, which shouldn't not have anything to do with how NVDIMM
partitions and maps it to GPA.

Ah, you are right! I got confused with the HVA.

Nonetheless, I don't see a need for having vNVDIMM in different
guest physical address range as the existing code has support for marking
memory ranges distinctly for DIMM/NVDIMM.

On another note, the x86 too does it the same way. There is no separate
range defined there.



3) You're not setting *any* extra flags on the entry.  How is the
guest supposed to know which are NVDIMM entries and which are regular
DIMM entries?  AFAICT in this version the NVDIMM slots are
indistinguishable from the unassigned hotplug memory (which makes the
difference in LMB and DRC numbering even more troubling).
For NVDIMM case, this patch should populate the LMB set in
ibm,dynamic-memory-v2 something like below:
              elem = spapr_get_drconf_cell(size /lmb_size, addr,
                                           0, -1,
SPAPR_LMB_FLAGS_RESERVED | SPAPR_LMB_FLAGS_DRC_INVALID);

This will ensure that the NVDIMM range will never be considered as
valid memory range for memory hotplug.
Hrm.  Ok so we already have code that does that for any gaps between
DIMMs.  I don't think there's actually anything that that code will do
differently than the code you have for NVDIMMs, so you could just skip
over the NVDIMMs here and it should do the right thing.

The *interpretation* of those entries will become different: for space
into which a regular DIMM is later inserted, we'll assume the DRC
index given is a base and there are more DRCs following it, but for
NVDIMMs we'll assume the same DRC throughout.  This is nuts, but IIUC
that's what PAPR says and we can't do much about it.
My current patch is buggy as Bharata pointed out. The NVDIMM DRCs
are not to be populated here, but mark the LMB DRCs as RESERVED and INVALID
so that no malicious attempts to online those LMBs at those NVDIMM address
ranges are attempted.

4) AFAICT these are _present_ NVDIMMs, so why is
SPAPR_LMB_FLAGS_ASSIGNED not set for them?  (and why is the node
forced to -1, regardless of di->node).
           QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
           nr_entries++;
           cur_addr = addr + size;
@@ -1261,6 +1273,85 @@ static void spapr_dt_hypervisor(SpaprMachineState 
*spapr, void *fdt)
       }
   }

+static void spapr_create_nvdimm_dr_connectors(SpaprMachineState *spapr)
+{
+    MachineState *machine = MACHINE(spapr);
+    int i;
+
+    for (i = 0; i < machine->ram_slots; i++) {
+        spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM, i);
What happens if you try to plug an NVDIMM to one of these slots, but a
regular DIMM has already taken it?
NVDIMM hotplug won't get that occupied slot.
Ok.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]