[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL 08/10] NUMA: Enable adding NUMA node implicitly
From: |
Thadeu Lima de Souza Cascardo |
Subject: |
Re: [Qemu-devel] [PULL 08/10] NUMA: Enable adding NUMA node implicitly |
Date: |
Thu, 16 Nov 2017 08:22:12 -0200 |
User-agent: |
NeoMutt/20170609 (1.8.3) |
On Wed, Nov 15, 2017 at 08:18:50PM +0200, Michael S. Tsirkin wrote:
> From: Dou Liyang <address@hidden>
>
> Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
> however currently QEMU doesn't create SRAT table if numa options aren't
> present
> on CLI.
>
> Which breaks both linux and windows guests in certain conditions:
> * Windows: won't enable memory hotplug without SRAT table at all
> * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT
> table
> present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
> when memory is hotplugged and guest tries to use it with that drivers.
>
> Fix above issues by automatically creating a numa node when QEMU is started
> with
> memory hotplug enabled but without '-numa' options on CLI.
> (PS: auto-create numa node only for new machine types so not to break
> migration).
>
> Which would provide SRAT table to guests without explicit -numa options on CLI
> and would allow:
> * Windows: to enable memory hotplug
> * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit
> allocated
> buffers that legacy drivers/hw can handle.
>
> [Rewritten by Igor]
Thanks for copying me on this.
Acked-by: Thadeu Lima de Souza Cascardo <address@hidden>
>
> Reported-by: Thadeu Lima de Souza Cascardo <address@hidden>
> Suggested-by: Igor Mammedov <address@hidden>
> Signed-off-by: Dou Liyang <address@hidden>
> Cc: Paolo Bonzini <address@hidden>
> Cc: Richard Henderson <address@hidden>
> Cc: Eduardo Habkost <address@hidden>
> Cc: "Michael S. Tsirkin" <address@hidden>
> Cc: Marcel Apfelbaum <address@hidden>
> Cc: Igor Mammedov <address@hidden>
> Cc: David Hildenbrand <address@hidden>
> Cc: Thomas Huth <address@hidden>
> Cc: Alistair Francis <address@hidden>
> Cc: Takao Indoh <address@hidden>
> Cc: Izumi Taku <address@hidden>
> Reviewed-by: Igor Mammedov <address@hidden>
> Reviewed-by: Michael S. Tsirkin <address@hidden>
> Signed-off-by: Michael S. Tsirkin <address@hidden>
> ---
> include/hw/boards.h | 1 +
> hw/i386/pc.c | 1 +
> hw/i386/pc_piix.c | 1 +
> hw/i386/pc_q35.c | 1 +
> numa.c | 21 ++++++++++++++++++++-
> vl.c | 3 +--
> 6 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 62f160e..156b16f 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -197,6 +197,7 @@ struct MachineClass {
> bool ignore_memory_transaction_failures;
> int numa_mem_align_shift;
> const char **valid_cpu_types;
> + bool auto_enable_numa_with_memhp;
> void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
> int nb_nodes, ram_addr_t size);
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index fafe5ba..c3afe5b 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2347,6 +2347,7 @@ static void pc_machine_class_init(ObjectClass *oc, void
> *data)
> mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
> mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
> mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
> + mc->auto_enable_numa_with_memhp = true;
> mc->has_hotpluggable_cpus = true;
> mc->default_boot_order = "cad";
> mc->hot_add_cpu = pc_hot_add_cpu;
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index f79d5cb..5e47528 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass
> *m)
> m->is_default = 0;
> m->alias = NULL;
> SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
> + m->auto_enable_numa_with_memhp = false;
> }
>
> DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index da3ea60..d606004 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
> m->alias = NULL;
> SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
> m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> + m->auto_enable_numa_with_memhp = false;
> }
>
> DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
> diff --git a/numa.c b/numa.c
> index 8d78d95..7151b24 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms,
> NumaNodeOptions *node,
> }
> numa_info[nodenr].present = true;
> max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> + nb_numa_nodes++;
> }
>
> static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
> @@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error
> **errp)
> if (err) {
> goto end;
> }
> - nb_numa_nodes++;
> break;
> case NUMA_OPTIONS_TYPE_DIST:
> parse_numa_distance(&object->u.dist, &err);
> @@ -433,6 +433,25 @@ void parse_numa_opts(MachineState *ms)
> exit(1);
> }
>
> + /*
> + * If memory hotplug is enabled (slots > 0) but without '-numa'
> + * options explicitly on CLI, guestes will break.
> + *
> + * Windows: won't enable memory hotplug without SRAT table at all
> + *
> + * Linux: if QEMU is started with initial memory all below 4Gb
> + * and no SRAT table present, guest kernel will use nommu DMA ops,
> + * which breaks 32bit hw drivers when memory is hotplugged and
> + * guest tries to use it with that drivers.
> + *
> + * Enable NUMA implicitly by adding a new NUMA node automatically.
> + */
> + if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
> + mc->auto_enable_numa_with_memhp) {
> + NumaNodeOptions node = { };
> + parse_numa_node(ms, &node, NULL);
> + }
> +
> assert(max_numa_nodeid <= MAX_NODES);
>
> /* No support for sparse NUMA node IDs yet: */
> diff --git a/vl.c b/vl.c
> index 7372424..1ad1c04 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4690,8 +4690,6 @@ int main(int argc, char **argv, char **envp)
> default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
> default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
>
> - parse_numa_opts(current_machine);
> -
> if (qemu_opts_foreach(qemu_find_opts("mon"),
> mon_init_func, NULL, NULL)) {
> exit(1);
> @@ -4741,6 +4739,7 @@ int main(int argc, char **argv, char **envp)
> current_machine->boot_order = boot_order;
> current_machine->cpu_model = cpu_model;
>
> + parse_numa_opts(current_machine);
>
> /* parse features once if machine provides default cpu_type */
> if (machine_class->default_cpu_type) {
> --
> MST
>
- [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 01/10] fix: unrealize virtio device if we fail to hotplug it, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 02/10] pci: Initialize pci_dev->name before use, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 03/10] tests: report errors when iasl exits with non-zero status, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 04/10] test: fix detection of errors from iasl, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 05/10] hw/pci-host: Fix x86 Host Bridges 64bit PCI hole, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 06/10] hw/pcie-pci-bridge: restrict to X86 and ARM, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 07/10] tests/acpi-test-data: update _CRS in DSDT, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 08/10] NUMA: Enable adding NUMA node implicitly, Michael S. Tsirkin, 2017/11/15
- Re: [Qemu-devel] [PULL 08/10] NUMA: Enable adding NUMA node implicitly,
Thadeu Lima de Souza Cascardo <=
- [Qemu-devel] [PULL 09/10] vmcoreinfo: put it in the 'misc' device category, Michael S. Tsirkin, 2017/11/15
- [Qemu-devel] [PULL 10/10] build-sys: restrict vmcoreinfo to fw_cfg+dma capable targets, Michael S. Tsirkin, 2017/11/15
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Peter Maydell, 2017/11/16
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Thomas Huth, 2017/11/16
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Michael S. Tsirkin, 2017/11/16
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Daniel P. Berrange, 2017/11/16
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Michael S. Tsirkin, 2017/11/16
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Peter Maydell, 2017/11/16
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Daniel P. Berrange, 2017/11/16
- Re: [Qemu-devel] [PULL 00/10] pc, pci, virtio: fixes for rc1, Peter Maydell, 2017/11/16