[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/3] hw/riscv/virt: Add memory hotplugging and virtio-md-p
From: |
Björn Töpel |
Subject: |
Re: [PATCH v2 1/3] hw/riscv/virt: Add memory hotplugging and virtio-md-pci support |
Date: |
Mon, 27 May 2024 14:03:28 +0200 |
David Hildenbrand <david@redhat.com> writes:
> On 24.05.24 15:14, Daniel Henrique Barboza wrote:
>>
>>
>> On 5/21/24 07:56, Björn Töpel wrote:
>>> From: Björn Töpel <bjorn@rivosinc.com>
>>>
>>> Virtio-based memory devices (virtio-mem/virtio-pmem) allows for
>>> dynamic resizing of virtual machine memory, and requires proper
>>> hotplugging (add/remove) support to work.
>>>
>>> Add device memory support for RISC-V "virt" machine, and enable
>>> virtio-md-pci with the corresponding missing hotplugging callbacks.
>>>
>>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>>> ---
>>> hw/riscv/Kconfig | 2 +
>>> hw/riscv/virt.c | 83 +++++++++++++++++++++++++++++++++++++++++-
>>> hw/virtio/virtio-mem.c | 5 ++-
>>> 3 files changed, 87 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
>>> index a2030e3a6ff0..08f82dbb681a 100644
>>> --- a/hw/riscv/Kconfig
>>> +++ b/hw/riscv/Kconfig
>>> @@ -56,6 +56,8 @@ config RISCV_VIRT
>>> select PLATFORM_BUS
>>> select ACPI
>>> select ACPI_PCI
>>> + select VIRTIO_MEM_SUPPORTED
>>> + select VIRTIO_PMEM_SUPPORTED
>>>
>>> config SHAKTI_C
>>> bool
>>> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
>>> index 4fdb66052587..443902f919d2 100644
>>> --- a/hw/riscv/virt.c
>>> +++ b/hw/riscv/virt.c
>>> @@ -53,6 +53,8 @@
>>> #include "hw/pci-host/gpex.h"
>>> #include "hw/display/ramfb.h"
>>> #include "hw/acpi/aml-build.h"
>>> +#include "hw/mem/memory-device.h"
>>> +#include "hw/virtio/virtio-mem-pci.h"
>>> #include "qapi/qapi-visit-common.h"
>>> #include "hw/virtio/virtio-iommu.h"
>>>
>>> @@ -1407,6 +1409,7 @@ static void virt_machine_init(MachineState *machine)
>>> DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
>>> int i, base_hartid, hart_count;
>>> int socket_count = riscv_socket_count(machine);
>>> + hwaddr device_memory_base, device_memory_size;
>>>
>>> /* Check socket count limit */
>>> if (VIRT_SOCKETS_MAX < socket_count) {
>>> @@ -1420,6 +1423,12 @@ static void virt_machine_init(MachineState *machine)
>>> exit(1);
>>> }
>>>
>>> + if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
>>> + error_report("unsupported amount of memory slots: %"PRIu64,
>>> + machine->ram_slots);
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> /* Initialize sockets */
>>> mmio_irqchip = virtio_irqchip = pcie_irqchip = NULL;
>>> for (i = 0; i < socket_count; i++) {
>>> @@ -1553,6 +1562,37 @@ static void virt_machine_init(MachineState *machine)
>>> memory_region_add_subregion(system_memory, memmap[VIRT_MROM].base,
>>> mask_rom);
>>>
>>> + /* device memory */
>>> + device_memory_base = ROUND_UP(s->memmap[VIRT_DRAM].base +
>>> machine->ram_size,
>>> + GiB);
>>> + device_memory_size = machine->maxram_size - machine->ram_size;
>>> + if (device_memory_size > 0) {
>>> + /*
>>> + * Each DIMM is aligned based on the backend's alignment value.
>>> + * Assume max 1G hugepage alignment per slot.
>>> + */
>>> + device_memory_size += machine->ram_slots * GiB;
>>
>> We don't need to align to 1GiB. This calc can use 2MiB instead (or 4MiB if
>> we're
>> running 32 bits).
>>
>>> +
>>> + if (riscv_is_32bit(&s->soc[0])) {
>>> + hwaddr memtop = device_memory_base +
>>> ROUND_UP(device_memory_size,
>>> + GiB);
>>
>> Same here - alignment is 2/4 MiB.
>>
>>> +
>>> + if (memtop > UINT32_MAX) {
>>> + error_report("memory exceeds 32-bit limit by %lu bytes",
>>> + memtop - UINT32_MAX);
>>> + exit(EXIT_FAILURE);
>>> + }
>>> + }
>>> +
>>> + if (device_memory_base + device_memory_size < device_memory_size) {
>>> + error_report("unsupported amount of device memory");
>>> + exit(EXIT_FAILURE);
>>> + }
>>
>> Took another look and found this a bit strange. These are all unsigned vars,
>> so
>> if (unsigned a + unsigned b < unsigned b) will always be 'false'. The
>> compiler is
>> probably cropping this out.
>
> No. Unsigned interger overflow is defined behavior and this is a common
> check to detect such overflow. tI's consistent with what we do for other
> architectures.
>
>>
>> The calc we need to do is to ensure that the extra ram_slots * alignment
>> will fit into
>> the VIRT_DRAM block, i.e. maxram_size + (ram_slots * alignment) <
>> memmap[VIRT_DRAM].size.
>>
>>
>> TBH I'm starting to have second thoughts about letting users hotplug
>> whatever they want.
>> It seems cleaner to just force the 2/4 Mb alignment in pre_plug() and be
>> done with it,
>> no need to allocate ram_slots * alignment and doing all these extra checks.
>
> It's worth noting that if user space decides to specify addresses
> manually, it can mess up everything already. There are other events that
> can result in fragmentation of the memory device area (repeated
> hot(un)plug of differing DIMMs).
>
> Assume you have 1 GiB range and hotplug a 512 MiB DIMM at offset 256
> MiB. You won't be able to hotplug another 512 MiB DIMM even though we
> reserved space.
>
> My take so far is: if the user wants to do such stuff it should size the
> area (maxmem) much larger or deal with the concequences (not being able
> to hotplug memory).
>
> It usually does not happen in practice ...
Daniel/David, again thanks for spending time on the patches.
The reason I picked 1 GiB per slot was because the alignment is also
dependent on the backend AFAIU. Rationale in commit 085f8e88ba73 ("pc:
count in 1Gb hugepage alignment when sizing hotplug-memory container").
What I'm reading from you guys are: Just depend on whatever maxmem says
(modulo 2/4M alignment), and leave at that. I agree that that's much
easier to reason about.
Correct?
>> As I sent in an earlier email, users must already comply to the alignment of
>> the host
>> memory when plugging pc-dimms, so I'm not sure our value/proposition with
>> all this
>> extra code is worth it - the alignment will most likely be forced by the
>> host memory
>> backend, so might as well force ourselves in pre_plug().
>
> At least on x86-64, the 2 MiB alignment requirement is handled
> automatically. QEMU_VMALLOC_ALIGN implicitly enforces that.
>
> Maybe RISCV also wants to set QEMU_VMALLOC_ALIGN?
I'll look into this for the next rev!
Björn
[PATCH v2 2/3] hw/riscv/virt-acpi-build: Expose device memory in ACPI SRAT, Björn Töpel, 2024/05/21
[PATCH v2 3/3] hw/riscv/virt: Add ACPI GED and PC-DIMM MHP support, Björn Töpel, 2024/05/21