[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PULL 37/63] smbios: make memory device size configurable per Machine
From: |
Michael S. Tsirkin |
Subject: |
[PULL 37/63] smbios: make memory device size configurable per Machine |
Date: |
Sun, 21 Jul 2024 20:18:06 -0400 |
From: Igor Mammedov <imammedo@redhat.com>
Currently QEMU describes initial[1] RAM* in SMBIOS as a series of
virtual DIMMs (capped at 16Gb max) using type 17 structure entries.
Which is fine for the most cases. However when starting guest
with terabytes of RAM this leads to too many memory device
structures, which eventually upsets linux kernel as it reserves
only 64K for these entries and when that border is crossed out
it runs out of reserved memory.
Instead of partitioning initial RAM on 16Gb DIMMs, use maximum
possible chunk size that SMBIOS spec allows[2]. Which lets
encode RAM in lower 31 bits of 32bit field (which amounts upto
2047Tb per DIMM).
As result initial RAM will generate only one type 17 structure
until host/guest reach ability to use more RAM in the future.
Compat changes:
We can't unconditionally change chunk size as it will break
QEMU<->guest ABI (and migration). Thus introduce a new machine
class field that would let older versioned machines to use
legacy 16Gb chunks, while new(er) machine type[s] use maximum
possible chunk size.
PS:
While it might seem to be risky to rise max entry size this large
(much beyond of what current physical RAM modules support),
I'd not expect it causing much issues, modulo uncovering bugs
in software running within guest. And those should be fixed
on guest side to handle SMBIOS spec properly, especially if
guest is expected to support so huge RAM configs.
In worst case, QEMU can reduce chunk size later if we would
care enough about introducing a workaround for some 'unfixable'
guest OS, either by fixing up the next machine type or
giving users a CLI option to customize it.
1) Initial RAM - is RAM configured with help '-m SIZE' CLI option/
implicitly defined by machine. It doesn't include memory
configured with help of '-device' option[s] (pcdimm,nvdimm,...)
2) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size
PS:
* tested on 8Tb host with RHEL6 guest, which seems to parse
type 17 SMBIOS table entries correctly (according to 'dmidecode').
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240715122417.4059293-1-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/hw/boards.h | 4 ++++
hw/arm/virt.c | 1 +
hw/core/machine.c | 6 ++++++
hw/i386/pc_piix.c | 1 +
hw/i386/pc_q35.c | 1 +
hw/smbios/smbios.c | 11 ++++++-----
6 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index ef6f18f2c1..48ff6d8b93 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -237,6 +237,9 @@ typedef struct {
* purposes only.
* Applies only to default memory backend, i.e., explicit memory backend
* wasn't used.
+ * @smbios_memory_device_size:
+ * Default size of memory device,
+ * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)"
*/
struct MachineClass {
/*< private >*/
@@ -304,6 +307,7 @@ struct MachineClass {
const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
ram_addr_t (*fixup_ram_size)(ram_addr_t size);
+ uint64_t smbios_memory_device_size;
};
/**
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0c68d66a3..719e83e6a1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1)
static void virt_machine_9_0_options(MachineClass *mc)
{
virt_machine_9_1_options(mc);
+ mc->smbios_memory_device_size = 16 * GiB;
compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len);
}
DEFINE_VIRT_MACHINE(9, 0)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index bc38cad7f2..ac30544e7f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1004,6 +1004,12 @@ static void machine_class_init(ObjectClass *oc, void
*data)
/* Default 128 MB as guest ram size */
mc->default_ram_size = 128 * MiB;
mc->rom_file_has_mr = true;
+ /*
+ * SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size
+ * use max possible value that could be encoded into
+ * 'Extended Size' field (2047Tb).
+ */
+ mc->smbios_memory_device_size = 2047 * TiB;
/* numa node memory size aligned on 8MB by default.
* On Linux, each node's border has to be 8MB aligned
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 9445b07b4f..d9e69243b4 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m)
pc_i440fx_machine_9_1_options(m);
m->alias = NULL;
m->is_default = false;
+ m->smbios_memory_device_size = 16 * GiB;
compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len);
compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len);
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 71d3c6d122..9d108b194e 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m)
PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
pc_q35_machine_9_1_options(m);
m->alias = NULL;
+ m->smbios_memory_device_size = 16 * GiB;
compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len);
compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len);
pcmc->isa_bios_alias = false;
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 3b7703489d..a394514264 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms,
Error **errp)
{
unsigned i, dimm_cnt, offset;
+ MachineClass *mc = MACHINE_GET_CLASS(ms);
ERRP_GUARD();
assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 ||
@@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms,
smbios_build_type_9_table(errp);
smbios_build_type_11_table();
-#define MAX_DIMM_SZ (16 * GiB)
-#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \
- : ((current_machine->ram_size - 1) %
MAX_DIMM_SZ) + 1)
+#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \
+ : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1)
- dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) /
- MAX_DIMM_SZ;
+ dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size,
+ mc->smbios_memory_device_size) /
+ mc->smbios_memory_device_size;
/*
* The offset determines if we need to keep additional space between
--
MST
- [PULL 29/63] hw/pci: Do not add ROM BAR for SR-IOV VF, (continued)
- [PULL 29/63] hw/pci: Do not add ROM BAR for SR-IOV VF, Michael S. Tsirkin, 2024/07/21
- [PULL 28/63] contrib/vhost-user-blk: fix overflowing expression, Michael S. Tsirkin, 2024/07/21
- [PULL 26/63] vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits, Michael S. Tsirkin, 2024/07/21
- [PULL 31/63] pcie_sriov: Ensure PF and VF are mutually exclusive, Michael S. Tsirkin, 2024/07/21
- [PULL 30/63] hw/pci: Fix SR-IOV VF number calculation, Michael S. Tsirkin, 2024/07/21
- [PULL 32/63] pcie_sriov: Check PCI Express for SR-IOV PF, Michael S. Tsirkin, 2024/07/21
- [PULL 34/63] virtio-pci: Implement SR-IOV PF, Michael S. Tsirkin, 2024/07/21
- [PULL 33/63] pcie_sriov: Allow user to create SR-IOV device, Michael S. Tsirkin, 2024/07/21
- [PULL 35/63] virtio-net: Implement SR-IOV VF, Michael S. Tsirkin, 2024/07/21
- [PULL 36/63] docs: Document composable SR-IOV device, Michael S. Tsirkin, 2024/07/21
- [PULL 37/63] smbios: make memory device size configurable per Machine,
Michael S. Tsirkin <=
- [PULL 38/63] accel/kvm: Extract common KVM vCPU {creation,parking} code, Michael S. Tsirkin, 2024/07/21
- [PULL 39/63] hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header file, Michael S. Tsirkin, 2024/07/21
- [PULL 40/63] hw/acpi: Update ACPI GED framework to support vCPU Hotplug, Michael S. Tsirkin, 2024/07/21
- [PULL 50/63] virtio-iommu: Add trace point on virtio_iommu_detach_endpoint_from_domain, Michael S. Tsirkin, 2024/07/21
- [PULL 41/63] hw/acpi: Update GED _EVT method AML with CPU scan, Michael S. Tsirkin, 2024/07/21
- [PULL 47/63] virtio-iommu: Free [host_]resv_ranges on unset_iommu_devices, Michael S. Tsirkin, 2024/07/21
- [PULL 44/63] gdbstub: Add helper function to unregister GDB register space, Michael S. Tsirkin, 2024/07/21
- [PULL 46/63] virtio-iommu: Remove probe_done, Michael S. Tsirkin, 2024/07/21
- [PULL 49/63] hw/vfio/common: Add vfio_listener_region_del_iommu trace event, Michael S. Tsirkin, 2024/07/21
- [PULL 55/63] tests/acpi: update expected DSDT blob for aarch64 and microvm, Michael S. Tsirkin, 2024/07/21