[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH] hostmem-file: add the 'hmem' option
From: |
Luo, Zhigang |
Subject: |
RE: [PATCH] hostmem-file: add the 'hmem' option |
Date: |
Fri, 13 Dec 2024 16:54:59 +0000 |
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Jonathan,
Could you please provide your comments?
Thanks,
Zhigang
> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Tuesday, December 10, 2024 5:02 PM
> To: Luo, Zhigang <Zhigang.Luo@amd.com>; qemu-devel@nongnu.org
> Cc: kraxel@redhat.com; Igor Mammedov <imammedo@redhat.com>; Jonathan
> Cameron <Jonathan.Cameron@huawei.com>
> Subject: Re: [PATCH] hostmem-file: add the 'hmem' option
>
> On 10.12.24 22:51, Luo, Zhigang wrote:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >> -----Original Message-----
> >> From: David Hildenbrand <david@redhat.com>
> >> Sent: Tuesday, December 10, 2024 2:55 PM
> >> To: Luo, Zhigang <Zhigang.Luo@amd.com>; qemu-devel@nongnu.org
> >> Cc: kraxel@redhat.com; Igor Mammedov <imammedo@redhat.com>
> >> Subject: Re: [PATCH] hostmem-file: add the 'hmem' option
> >>
> >> On 10.12.24 20:32, Luo, Zhigang wrote:
> >>> [AMD Official Use Only - AMD Internal Distribution Only]
> >>>
> >>> Hi David,
> >>>
> >>
> >> Hi,
> >>
> >>>>>
> >>>>> Thanks for your comments.
> >>>>> Let me give you some background for this patch.
> >>>>> I am currently engaged in a project that requires to pass the
> >>>>> EFI_MEMORY_SP
> >>>> (Special Purpose Memory) type memory from host to a virtual machine
> >>>> within QEMU. This memory needs to be EFI_MEMORY_SP type in the
> >>>> virtual machine as well.
> >>>>> This particular memory type is essential for the functionality of my
> >>>>> project.
> >>>>
> >>>> Which exact guest memory will be backed by this memory? All guest-memory?
> >>> [Luo, Zhigang] not all guest-memory. Only the memory reserved for
> >>> specific
> >> device.
> >>
> >> Can you show me an example QEMU cmdline, and how you would pass that
> >> hostmem-file object to the device?
> >>
> > [Luo, Zhigang] the following is an example. m1 is the reserved memory for
> > pci
> device "0000:03:00.0". both the memory and pci device are set to same numa
> node.
> >
> > -object memory-backend-ram,size=8G,id=m0 \ -object
> > memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,al
> > ign=1G,hmem=on \ -numa node,nodeid=0,memdev=m0 -numa
> > node,nodeid=1,memdev=m1 \
>
> Okay, so you expose this memory as a second numa node, and want the guest to
> identify the second numa node as SP to not use it during boot.
>
> Let me CC Jonathan, I am pretty sure he has an idea what to do here.
>
> > -device pxb-pcie,id=pcie.1,numa_node=1,bus_nr=2,bus=pcie.0 \ -device
> > ioh3420,id=pcie_port1,bus=pcie.1,chassis=1 \ -device
> > vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pcie_port1
> >
> >>>
> >>>>
> >>>> And, what is the guest OS going to do with this memory?
> >>> [Luo, Zhigang] the device driver in guest will use this reserved memory.
> >>
> >> Okay, so just like CXL memory.
> >>
> >>>
> >>>>
> >>>> Usually, this SP memory (dax, cxl, ...) is not used as boot memory.
> >>>> Like on a bare metal system, one would expect that only CXL memory
> >>>> will be marked as special and put aside to the cxl driver, such
> >>>> that the OS can boot on ordinary DIMMs, such that cxl can online it etc.
> >>>>
> >>>> So maybe you would want to expose this memory using CXL-mem device
> >>>> to the VM? Or a DIMM?
> >>>>
> >>>> I assume the alternative is to tell the VM on the Linux kernel
> >>>> cmdline to set EFI_MEMORY_SP on this memory. I recall that there is
> >>>> a way to
> >> achieve that.
> >>>>
> >>> [Luo, Zhigang] I know this option. but it requires the end user to
> >>> know where is the
> >> memory location in guest side(start address, size).
> >>
> >> Right.
> >>
> >>>
> >>>
> >>>>> In Linux, the SPM memory will be claimed by hmem-dax driver by
> >>>>> default. With
> >>>> this patch I can use the following config to pass the SPM memory to
> >>>> guest VM.
> >>>>> -object
> >>>>> memory-backend-file,size=30G,id=m1,mem-path=/dev/dax0.0,prealloc=o
> >>>>> n,
> >>>>> al
> >>>>> ign=1G,hmem=on
> >>>>>
> >>>>> I was thinking to change the option name from "hmem" to "spm" to
> >>>>> avoid
> >>>> confusion.
> >>>>
> >>>> Likely it should be specified elsewhere, that you want specific
> >>>> guest RAM ranges to be EFI_MEMORY_SP. For a DIMM, it could be a
> >>>> property, similarly maybe for CXL- mem devices (no expert on that).
> >>>>
> >>>> For boot memory / machine memory it could be a machine property.
> >>>> But I'll first have to learn which ranges you actually want to
> >>>> expose that way, and what the VM will do with that information.
> >>> [Luo, Zhigang] we want to expose the SPM memory reserved for specific
> device.
> >> And we will pass the SPM memory and the device to guest. Then the
> >> device driver can use the SPM memory in guest side.
> >>
> >> Then the device driver should likely have a way to configure that,
> >> not the memory backend.
> >>
> >> After all, the device driver will map it somehow into guest physical
> >> address space (how?).
> >>
> > [Luo, Zhigang] from guest view, it's still system memory, but marked as
> > SPM. So,
> qemu will map the memory to guest physical address space.
> > The device driver just claims to use the SPM memory in guest side.
> >
> >>>
> >>>>
> >>>>>
> >>>>> Do you have any suggestions to achieve this more reasonable?
> >>>>
> >>>> The problem with qemu_ram_foreach_block() is that you would
> >>>> indicate also DIMMs, virtio-mem, ... and even RAMBlocks that are
> >>>> not even used for backing anything to the VM as EFI_MEMORY_SP, which is
> wrong.
> >>> [Luo, Zhigang] qemu_ram_foreach_block() will list all memory block,
> >>> but in
> >> pc_update_hmem_memory(), only the memory block with "hmem" flag will
> >> be updated to SPM memory.
> >>
> >> Yes, but imagine a user passing such a memory backend to a
> >> DIMM/virtio-mem/boot memory etc. It will have very undesired side effects.
> >>
> > [Luo, Zhigang] the user should know what he/she is doing when he/she set
> > the flag
> for the memory region.
>
> No, we must not allow to create insane configurations that don't make any
> sense.
>
> Sufficient to add:
>
> -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on
>
> to the cmdline to cause a mess.
>
>
> Maybe it should be a "numa" node configuration like
>
> -numa node,nodeid=1,memdev=m1,sp=on
>
> But I recall that we discussed something related with Jonathan, so I'm hoping
> we
> can get his input.
>
> --
> Cheers,
>
> David / dhildenb
- [PATCH] hostmem-file: add the 'hmem' option, Zhigang Luo, 2024/12/04
- Re: [PATCH] hostmem-file: add the 'hmem' option, David Hildenbrand, 2024/12/06
- RE: [PATCH] hostmem-file: add the 'hmem' option, Luo, Zhigang, 2024/12/06
- Re: [PATCH] hostmem-file: add the 'hmem' option, David Hildenbrand, 2024/12/09
- RE: [PATCH] hostmem-file: add the 'hmem' option, Luo, Zhigang, 2024/12/10
- Re: [PATCH] hostmem-file: add the 'hmem' option, David Hildenbrand, 2024/12/10
- RE: [PATCH] hostmem-file: add the 'hmem' option, Luo, Zhigang, 2024/12/10
- Re: [PATCH] hostmem-file: add the 'hmem' option, David Hildenbrand, 2024/12/10
- RE: [PATCH] hostmem-file: add the 'hmem' option,
Luo, Zhigang <=
- Re: [PATCH] hostmem-file: add the 'hmem' option, Igor Mammedov, 2024/12/16
- RE: [PATCH] hostmem-file: add the 'hmem' option, Luo, Zhigang, 2024/12/16