[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Questions about vNVDIMM on qemu/KVM
From: |
Pankaj Gupta |
Subject: |
Re: [Qemu-devel] Questions about vNVDIMM on qemu/KVM |
Date: |
Mon, 4 Jun 2018 04:03:06 -0400 (EDT) |
Hi,
>
> > I'm investigating status of vNVDIMM on qemu/KVM,
> > and I have some questions about it. I'm glad if anyone answer them.
> >
> > In my understanding, qemu/KVM has a feature to show NFIT for guest,
> > and it will be still updated about platform capability with this patch set.
> > https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04756.html
> >
> > And libvirt also supports this feature with <memory model='nvdimm'>
> > https://libvirt.org/formatdomain.html#elementsMemory
> >
> >
> > However, virtio-pmem is developing now, and it is better
> > for archtectures to detect regions of NVDIMM without ACPI (like s390x)
> > In addition, It is also necessary to flush guest contents on vNVDIMM
> > who has a backend-file.
> >
> >
> > Q1) Does ACPI.NFIT bus of qemu/kvm remain with virtio-pmem?
No.
> > How do each roles become it if both NFIT and virtio-pmem will be
> > available?
There are two main use cases:
1] DAX memory region pass-through to guest:
-------------------------------------------
As this region is present in actual physical NVDIMM device and exposed
to guest,
ACPI/NFIT way is used. If all the persistent memory is used by only
this way we
don't need 'virtio-pmem'.
2] Emulated DAX memory region in host passed to guest:
--------------------------------------------------------
If this type of region is exposed to guest, it will be preferable to use
'virtio-pmem'.
This is regular host memory which is mmaped in guest address space for
emulating
persistent memory. Guest writes are present in host page cache and not
assured to be
written on backing disk without an explicit flush/sync call.
'virtio-pmem' will solve the problem of flushing guest writes present
in host page cache.
With filesystems at host which use journal-ling like (ext4, xfs), they
automatically call
'fsync' at regular intervals. but still there is not 100% assurance of
all write persistence until
an explicit flush is done from guest. So, we need an additional fsync
to flush guest writes to
backing disk. We are using this approach to avoid using guest page
cache and keep page cache management
of all the guests at host side.
If both ACPI NFIT and virtio-pmem are present, both will have their
corresponding memory regions and
defined by memory type "Persistent shared Memory" in case of
virtio-pmem and "Persistent Memory" for
ACPI NVDIMM. This is to differentiate both the memory types.
> > If my understanding is correct, both NFIT and virtio-pmem is used to
> > detect vNVDIMM regions, but only one seems to be necessary....
> >
> > Otherwize, is the NFIT bus just for keeping compatibility,
> > and virtio-pmem is promising way?
> >
> >
> > Q2) What bus is(will be?) created for virtio-pmem?
> > I could confirm the bus of NFIT is created with <memory
> > model='nvdimm'>,
For virtio-pmem also its nvdimm bus.
> > and I heard other bus will be created for virtio-pmem, but I could not
> > find what bus is created concretely.
> > ---
> > # ndctl list -B
> > {
> > "provider":"ACPI.NFIT",
> > "dev":"ndbus0"
> > }
> > ---
> >
> > I think it affects what operations user will be able to, and what
> > notification is necessary for vNVDIMM.
> > ACPI defines some operations like namespace controll, and notification
> > for NVDIMM health status or others.
> > (I suppose that other status notification might be necessary for
> > vNVDIMM,
> > but I'm not sure yet...)
For virtio-pmem, we are not providing advance features like namespace
and various
other features which ACPI/NVDIMM hardware provides. This is just to
keep paravirt
device simple.
Moreover I have not yet looked at ndctl side of things. I am not 100%
sure how
ndctl will handle 'virtio-pmem'.
Adding 'Dan' in loop, he can add his thoughts.
Thanks,
Pankaj
> >
> > If my understanding is wrong, please correct me.
> >
> > Thanks,
> > ---
> > Yasunori Goto
> >
> >
> >
>