[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM
From: |
Igor Mammedov |
Subject: |
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM |
Date: |
Mon, 12 Oct 2015 10:20:07 +0200 |
On Mon, 12 Oct 2015 11:06:20 +0800
Xiao Guangrong <address@hidden> wrote:
>
>
> On 10/12/2015 10:59 AM, Bharata B Rao wrote:
> > Xiao,
> >
> > Are these patches present in any git tree so that they can be easily tried
> > out.
> >
>
> Sorry, currently no git tree out of my workspace is available :(
Is it possible for you to put working tree on github?
>
> BTW, this patchset is based on top of the commit b37686f7e on qemu tree:
> commit b37686f7e84b22cfaf7fd01ac5133f2617cc3027
> Merge: 8be6e62 98cf48f
> Author: Peter Maydell <address@hidden>
> Date: Fri Oct 9 12:18:13 2015 +0100
>
> Merge remote-tracking branch
> 'remotes/stefanha/tags/tracing-pull-request' into staging
>
> Thanks.
>
> > Regards,
> > Bharata.
> >
> > On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong
> > <address@hidden> wrote:
> >> Changelog in v3:
> >> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> >> Michael for their valuable comments, the patchset finally gets better
> >> shape.
> >> - changes from Igor's comments:
> >> 1) abstract dimm device type from pc-dimm and create nvdimm device
> >> based on
> >> dimm, then it uses memory backend device as nvdimm's memory and NUMA
> >> has
> >> easily been implemented.
> >> 2) let file-backend device support any kind of filesystem not only for
> >> hugetlbfs and let it work on file not only for directory which is
> >> achieved by extending 'mem-path' - if it's a directory then it works
> >> as
> >> current behavior, otherwise if it's file then directly allocates
> >> memory
> >> from it.
> >> 3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
> >> 0xFFF00000, this range is large enough for NVDIMM ACPI as build
> >> 64-bit
> >> ACPI SSDT/DSDT table will break windows XP.
> >> BTW, only make SSDT.rev = 2 can not work since the width is only
> >> depended
> >> on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition
> >> Block)
> >> in ACPI spec:
> >> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> >> | width of Integer objects is dependent on the ComplianceRevision of the
> >> DSDT.
> >> | If the ComplianceRevision is less than 2, all integers are restricted to
> >> 32
> >> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT
> >> sets
> >> | the global integer width for all integers, including integers in SSDTs.
> >> 4) use the lowest ACPI spec version to document AML terms.
> >> 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"
> >>
> >> - changes from Stefan's comments:
> >> 1) do not do endian adjustment in-place since _DSM memory is visible to
> >> guest
> >> 2) use target platform's target page size instead of fixed PAGE_SIZE
> >> definition
> >> 3) lots of code style improvement and typo fixes.
> >> 4) live migration fix
> >> - changes from Paolo's comments:
> >> 1) improve the name of memory region
> >>
> >> - other changes:
> >> 1) return exact buffer size for _DSM method instead of the page size.
> >> 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all
> >> nvdimm
> >> devices.
> >> 3) NUMA support
> >> 4) implement _FIT method
> >> 5) rename "configdata" to "reserve-label-data"
> >> 6) simplify _DSM arg3 determination
> >> 7) main changelog update to let it reflect v3.
> >>
> >> Changlog in v2:
> >> - Use litten endian for DSM method, thanks for Stefan's suggestion
> >>
> >> - introduce a new parameter, @configdata, if it's false, Qemu will
> >> build a static and readonly namespace in memory and use it serveing
> >> for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
> >> reserved region is needed at the end of the @file, it is good for
> >> the user who want to pass whole nvdimm device and make its data
> >> completely be visible to guest
> >>
> >> - divide the source code into separated files and add maintain info
> >>
> >> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
> >> be posted on next week
> >>
> >> ====== Background ======
> >> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
> >> on Intel's platform. They are discovered via ACPI and configured by _DSM
> >> method of NVDIMM device in ACPI. There has some supporting documents which
> >> can be found at:
> >> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> >> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> >> DSM Interface Example:
> >> http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> >> Driver Writer's Guide:
> >> http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> >>
> >> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
> >> this patchset tries to enable it in virtualization field
> >>
> >> ====== Design ======
> >> NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
> >> address space then CPU can directly access it as normal memory, another is
> >> BLK which is used as block device to reduce the occupying of CPU address
> >> space
> >>
> >> BLK mode accesses NVDIMM via Command Register window and Data Register
> >> window.
> >> BLK virtualization has high workload since each sector access will cause at
> >> least two VM-EXIT. So we currently only imperilment vPMEM in this patchset
> >>
> >> --- vPMEM design ---
> >> We introduce a new device named "nvdimm", it uses memory backend device as
> >> NVDIMM memory. The file in file-backend device can be a regular file and
> >> block
> >> device. We can use any file when we do test or emulation, however,
> >> in the real word, the files passed to guest are:
> >> - the regular file in the filesystem with DAX enabled created on NVDIMM
> >> device
> >> on host
> >> - the raw PMEM device on host, e,g /dev/pmem0
> >> Memory access on the address created by mmap on these kinds of files can
> >> directly reach NVDIMM device on host.
> >>
> >> --- vConfigure data area design ---
> >> Each NVDIMM device has a configure data area which is used to store label
> >> namespace data. In order to emulating this area, we divide the file into
> >> two
> >> parts:
> >> - first parts is (0, size - 128K], which is used as PMEM
> >> - 128K at the end of the file, which is used as Label Data Area
> >> So that the label namespace data can be persistent during power lose or
> >> system
> >> failure.
> >>
> >> We also support passing the whole file to guest without reserve any region
> >> for
> >> label data area which is achieved by "reserve-label-data" parameter - if
> >> it's
> >> false then QEMU will build static and readonly namespace in memory and that
> >> namespace contains the whole file size. The parameter is false on default.
> >>
> >> --- _DSM method design ---
> >> _DSM in ACPI is used to configure NVDIMM, currently we only allow access of
> >> label namespace data, i.e, Get Namespace Label Size (Function Index 4),
> >> Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
> >> (Function Index 6)
> >>
> >> _DSM uses two pages to transfer data between ACPI and Qemu, the first page
> >> is RAM-based used to save the input info of _DSM method and Qemu reuse it
> >> store output info and another page is MMIO-based, ACPI write data to this
> >> page to transfer the control to Qemu
> >>
> >> ====== Test ======
> >> In host
> >> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G
> >> count=10
> >> 2) append "-object memory-backend-file,share,id=mem1,
> >> mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
> >> id=nv1" in QEMU command line
> >>
> >> In guest, download the latest upsteam kernel (4.2 merge window) and enable
> >> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
> >> 1) insmod drivers/nvdimm/libnvdimm.ko
> >> 2) insmod drivers/acpi/nfit.ko
> >> 3) insmod drivers/nvdimm/nd_btt.ko
> >> 4) insmod drivers/nvdimm/nd_pmem.ko
> >> You can see the whole nvdimm device used as a single namespace and
> >> /dev/pmem0
> >> appears. You can do whatever on /dev/pmem0 including DAX access.
> >>
> >> Currently Linux NVDIMM driver does not support namespace operation on this
> >> kind of PMEM, apply below changes to support dynamical namespace:
> >>
> >> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct
> >> acpi_nfit_desc *a
> >> continue;
> >> }
> >>
> >> - if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> >> + //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> >> + if (nfit_mem->memdev_pmem)
> >> flags |= NDD_ALIASING;
> >>
> >> You can append another NVDIMM device in guest and do:
> >> # cd /sys/bus/nd/devices/
> >> # cd namespace1.0/
> >> # echo `uuidgen` > uuid
> >> # echo `expr 1024 \* 1024 \* 128` > size
> >> then reload nd.pmem.ko
> >>
> >> You can see /dev/pmem1 appears
> >>
> >> ====== TODO ======
> >> NVDIMM hotplug support
> >>
> >> Xiao Guangrong (32):
> >> acpi: add aml_derefof
> >> acpi: add aml_sizeof
> >> acpi: add aml_create_field
> >> acpi: add aml_mutex, aml_acquire, aml_release
> >> acpi: add aml_concatenate
> >> acpi: add aml_object_type
> >> util: introduce qemu_file_get_page_size()
> >> exec: allow memory to be allocated from any kind of path
> >> exec: allow file_ram_alloc to work on file
> >> hostmem-file: clean up memory allocation
> >> hostmem-file: use whole file size if possible
> >> pc-dimm: remove DEFAULT_PC_DIMMSIZE
> >> pc-dimm: make pc_existing_dimms_capacity static and rename it
> >> pc-dimm: drop the prefix of pc-dimm
> >> stubs: rename qmp_pc_dimm_device_list.c
> >> pc-dimm: rename pc-dimm.c and pc-dimm.h
> >> dimm: abstract dimm device from pc-dimm
> >> dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
> >> dimm: keep the state of the whole backend memory
> >> dimm: introduce realize callback
> >> nvdimm: implement NVDIMM device abstract
> >> nvdimm: init the address region used by NVDIMM ACPI
> >> nvdimm: build ACPI NFIT table
> >> nvdimm: init the address region used by DSM method
> >> nvdimm: build ACPI nvdimm devices
> >> nvdimm: save arg3 for NVDIMM device _DSM method
> >> nvdimm: support DSM_CMD_IMPLEMENTED function
> >> nvdimm: support DSM_CMD_NAMESPACE_LABEL_SIZE function
> >> nvdimm: support DSM_CMD_GET_NAMESPACE_LABEL_DATA
> >> nvdimm: support DSM_CMD_SET_NAMESPACE_LABEL_DATA
> >> nvdimm: allow using whole backend memory as pmem
> >> nvdimm: add maintain info
> >>
> >> MAINTAINERS | 6 +
> >> backends/hostmem-file.c | 58 +-
> >> default-configs/i386-softmmu.mak | 2 +
> >> default-configs/x86_64-softmmu.mak | 2 +
> >> exec.c | 113 ++-
> >> hmp.c | 2 +-
> >> hw/Makefile.objs | 2 +-
> >> hw/acpi/aml-build.c | 83 ++
> >> hw/acpi/ich9.c | 8 +-
> >> hw/acpi/memory_hotplug.c | 26 +-
> >> hw/acpi/piix4.c | 8 +-
> >> hw/i386/acpi-build.c | 4 +
> >> hw/i386/pc.c | 37 +-
> >> hw/mem/Makefile.objs | 3 +
> >> hw/mem/{pc-dimm.c => dimm.c} | 240 ++---
> >> hw/mem/nvdimm/acpi.c | 961
> >> +++++++++++++++++++++
> >> hw/mem/nvdimm/internal.h | 41 +
> >> hw/mem/nvdimm/namespace.c | 309 +++++++
> >> hw/mem/nvdimm/nvdimm.c | 136 +++
> >> hw/mem/pc-dimm.c | 506 +----------
> >> hw/ppc/spapr.c | 20 +-
> >> include/hw/acpi/aml-build.h | 8 +
> >> include/hw/i386/pc.h | 4 +-
> >> include/hw/mem/{pc-dimm.h => dimm.h} | 68 +-
> >> include/hw/mem/nvdimm.h | 58 ++
> >> include/hw/mem/pc-dimm.h | 105 +--
> >> include/hw/ppc/spapr.h | 2 +-
> >> include/qemu/osdep.h | 1 +
> >> numa.c | 4 +-
> >> qapi-schema.json | 8 +-
> >> qmp.c | 4 +-
> >> stubs/Makefile.objs | 2 +-
> >> ...c_dimm_device_list.c => qmp_dimm_device_list.c} | 4 +-
> >> target-ppc/kvm.c | 21 +-
> >> trace-events | 8 +-
> >> util/oslib-posix.c | 16 +
> >> util/oslib-win32.c | 5 +
> >> 37 files changed, 2023 insertions(+), 862 deletions(-)
> >> rename hw/mem/{pc-dimm.c => dimm.c} (65%)
> >> create mode 100644 hw/mem/nvdimm/acpi.c
> >> create mode 100644 hw/mem/nvdimm/internal.h
> >> create mode 100644 hw/mem/nvdimm/namespace.c
> >> create mode 100644 hw/mem/nvdimm/nvdimm.c
> >> rewrite hw/mem/pc-dimm.c (91%)
> >> rename include/hw/mem/{pc-dimm.h => dimm.h} (50%)
> >> create mode 100644 include/hw/mem/nvdimm.h
> >> rewrite include/hw/mem/pc-dimm.h (97%)
> >> rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (56%)
> >>
> >> --
> >> 1.8.3.1
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to address@hidden
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> >
- Re: [Qemu-devel] [PATCH v3 04/32] acpi: add aml_mutex, aml_acquire, aml_release, (continued)
- [Qemu-devel] [PATCH v3 10/32] hostmem-file: clean up memory allocation, Xiao Guangrong, 2015/10/10
- [Qemu-devel] [PATCH v3 12/32] pc-dimm: remove DEFAULT_PC_DIMMSIZE, Xiao Guangrong, 2015/10/10
- [Qemu-devel] [PATCH v3 22/32] nvdimm: init the address region used by NVDIMM ACPI, Xiao Guangrong, 2015/10/10
- [Qemu-devel] [PATCH v3 16/32] pc-dimm: rename pc-dimm.c and pc-dimm.h, Xiao Guangrong, 2015/10/10
- [Qemu-devel] [PATCH v3 29/32] nvdimm: support DSM_CMD_GET_NAMESPACE_LABEL_DATA, Xiao Guangrong, 2015/10/10
- Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM, Bharata B Rao, 2015/10/11
- Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM, Michael S. Tsirkin, 2015/10/12
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM, Michael S. Tsirkin, 2015/10/19