[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC v3 3/3] mhsld: implement MHSLD device
From: |
Jonathan Cameron |
Subject: |
Re: [PATCH RFC v3 3/3] mhsld: implement MHSLD device |
Date: |
Thu, 12 Dec 2024 17:40:16 +0000 |
On Fri, 18 Oct 2024 12:12:52 -0400
Gregory Price <gourry@gourry.net> wrote:
> From: Svetly Todorov <svetly.todorov@memverge.com>
>
> Using a shared-memory bytemap, validates that DC adds, releases,
> and reclamations happen on extents belonging to the appropriate
> host.
>
> The MHSLD device inherits from the CXL_TYPE3 class and adds the following
> configuration options:
As in previous the -- syntax is somewhat confusing as that's
not how the parameters are applied.
> --mhd-head=<u32>
> --mhd-state_file=<str>
> --mhd-init=<bool>
>
> --mhd-head specifies the head ID of the host on the given device.
>
> --mhd-state_file is the name of the shared-memory-backed file used
> to store the MHD state.
>
> --mhd-init indicates whether this QEMU instance should initialize
> the state_file; if so, the instance will create the file if it does
> not exist, ftruncate it to the appropriate size, and initialize its
> header. It is assumed that the --mhd-init instance is run and allowed
> to completely finish configuration before any other guests access the
> shared state.
>
> The shared state file only needs to be intialized once. Even if a guest
> dies without clearing the ownership bits associated with its head-ID,
> future guests with that ID will clear those bits in cxl_mhsld_realize(),
> regardless of whether mhd_init is true or false.
That sounds like a race condition if not all hosts are brought
up before the first add.
>
> The following command line options create an MHSLD with 4GB of
> backing memory, whose state is tracked in /dev/shm/mhd_metadata.
> --mhd-init=true tells this instance to initialize the state as
> described above.
>
> ./qemu-system_x86-64 \
> [... other options ...] \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> -object memory-backend-ram,id=mem0,size=4G \
> -device
> cxl-mhsld,bus=rp0,num-dc-regions=1,volatile-dc-memdev=mem0,id=cxl-mem0,sn=66667,mhd-head=0,mhd-state_file=mhd_metadata,mhd-init=true
> \
> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G \
> -qmp unix:/tmp/qmp-sock-1,server,nowait
>
> Once this guest completes setup, other guests looking to access the
> device can be booted with the same configuration options, but with
> --mhd-head != 0,
> --mhd-init=false,
> and a different QMP socket.
>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Svetly Todorov <svetly.todorov@memverge.com>
A few trivial things inline.
In general the scheme looks workable but I'm not sure the contraints at setup
time
etc are suitable for an upstream solution. Certainly a useful tool to have
for kernel development though so I'll try and find time in next few days to
apply
this on my gitlab tree.
Longer term I think we need a more complex external program or a main / proxy
type arrangement so that ordering requirements can be enforce and we can have
richer info. Having to chat to each qmp interface independently works fine is
also a bit more complex than I think we would eventually want.
Having a solution in place though will make it much easier to move towards
an eventual upstreamable solution. This is a great place to start from!
Jonathan
> +static bool cxl_mhsld_release_extent(PCIDevice *d, uint64_t dpa, uint64_t
> len)
> +{
> + cxl_mhsld_state_clear(CXL_MHSLD(d), dpa / MHSLD_BLOCK_SZ,
> + len / MHSLD_BLOCK_SZ);
Trivial but align after (
> + return true;
> +}
> diff --git a/hw/cxl/mhsld/mhsld.h b/hw/cxl/mhsld/mhsld.h
> new file mode 100644
> index 0000000000..e7ead1f0d2
> --- /dev/null
> +++ b/hw/cxl/mhsld/mhsld.h
> @@ -0,0 +1,75 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * Copyright (c) 2024 MemVerge Inc.
> + *
> + */
> +
> +#ifndef CXL_MHSLD_H
> +#define CXL_MHSLD_H
> +#include <stdint.h>
> +#include "hw/cxl/cxl.h"
> +#include "hw/cxl/cxl_mailbox.h"
> +#include "hw/cxl/cxl_device.h"
> +#include "qemu/units.h"
> +
> +#define MHSLD_BLOCK_SZ (2 * MiB)
> +
> +/*
> + * We limit the number of heads to prevent the shared state
> + * region from becoming a major memory hog. We need 512MB of
> + * memory space to track 8-host ownership of 4GB of memory in
> + * blocks of 2MB. This can change if the block size is increased.
I'm lost what makes up that size?
> + */
> +#define MHSLD_HEADS (8)
> +
> +/*
> + * The shared state cannot have 2 variable sized regions
> + * so we have to max out the ldmap.
> + */
> +typedef struct MHSLDSharedState {
> + uint8_t nr_heads;
> + uint8_t nr_lds;
> + uint8_t ldmap[MHSLD_HEADS];
> + uint64_t nr_blocks;
> + uint8_t blocks[];
> +} MHSLDSharedState;
> +
> +struct CXLMHSLDState {
> + CXLType3Dev ct3d;
> + bool mhd_init;
> + char *mhd_state_file;
> + int mhd_state_fd;
> + size_t mhd_state_size;
> + uint32_t mhd_head;
> + MHSLDSharedState *mhd_state;
> +};
> +
> +struct CXLMHSLDClass {
> + CXLType3Class parent_class;
> +};
> +
> +enum {
> + MHSLD_MHD = 0x55,
> + #define GET_MHD_INFO 0x0
> +};
> +
> +/*
> + * MHD Get Info Command
> + * Returns information the LD's associated with this head
> + */
> +typedef struct MHDGetInfoInput {
> + uint8_t start_ld;
> + uint8_t ldmap_len;
> +} QEMU_PACKED MHDGetInfoInput;
> +
> +typedef struct MHDGetInfoOutput {
> + uint8_t nr_lds;
> + uint8_t nr_heads;
> + uint16_t resv1;
> + uint8_t start_ld;
> + uint8_t ldmap_len;
> + uint16_t resv2;
> + uint8_t ldmap[];
> +} QEMU_PACKED MHDGetInfoOutput;
> +#endif
- Re: [PATCH RFC v3 3/3] mhsld: implement MHSLD device,
Jonathan Cameron <=