[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC v3 3/3] mhsld: implement MHSLD device
From: |
Gregory Price |
Subject: |
Re: [PATCH RFC v3 3/3] mhsld: implement MHSLD device |
Date: |
Thu, 12 Dec 2024 14:52:02 -0500 |
On Thu, Dec 12, 2024 at 05:40:16PM +0000, Jonathan Cameron via wrote:
> On Fri, 18 Oct 2024 12:12:52 -0400
> Gregory Price <gourry@gourry.net> wrote:
>
> > From: Svetly Todorov <svetly.todorov@memverge.com>
> >
> > The shared state file only needs to be intialized once. Even if a guest
> > dies without clearing the ownership bits associated with its head-ID,
> > future guests with that ID will clear those bits in cxl_mhsld_realize(),
> > regardless of whether mhd_init is true or false.
>
> That sounds like a race condition if not all hosts are brought
> up before the first add.
>
We weighed this against having to do an external setup like
# SHMID = ./create_sharedmem.sh
# ./launch_qemu --shmid=$SHMID
Which is what the original non-generalized prototype did.
So yeah, there's a race condition AND a footgun (setting init AFTER
qemu instances are already using the memory will blow the state away).
This was intended.
As you allude to in the next chunk - the only real way to get around
this is to have an entirely external process serialize access.
> >
> > The following command line options create an MHSLD with 4GB of
> > backing memory, whose state is tracked in /dev/shm/mhd_metadata.
> > --mhd-init=true tells this instance to initialize the state as
> > described above.
> >
> > ./qemu-system_x86-64 \
> > [... other options ...] \
> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> > -object memory-backend-ram,id=mem0,size=4G \
> > -device
> > cxl-mhsld,bus=rp0,num-dc-regions=1,volatile-dc-memdev=mem0,id=cxl-mem0,sn=66667,mhd-head=0,mhd-state_file=mhd_metadata,mhd-init=true
> > \
> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G \
> > -qmp unix:/tmp/qmp-sock-1,server,nowait
> >
> > Once this guest completes setup, other guests looking to access the
> > device can be booted with the same configuration options, but with
> > --mhd-head != 0,
> > --mhd-init=false,
> > and a different QMP socket.
> >
> > Signed-off-by: Gregory Price <gourry@gourry.net>
> > Signed-off-by: Svetly Todorov <svetly.todorov@memverge.com>
>
> A few trivial things inline.
>
> In general the scheme looks workable but I'm not sure the contraints at setup
> time
> etc are suitable for an upstream solution. Certainly a useful tool to have
> for kernel development though so I'll try and find time in next few days to
> apply
> this on my gitlab tree.
>
> Longer term I think we need a more complex external program or a main / proxy
> type arrangement so that ordering requirements can be enforce
I marginally disagree. We have to check ownership during memory use.
We should try not to have an external process dependency for deferencing a
pointer backed by an emulated DCD device. The current overhead is bad enough.
The shared memory use here mostly limits that overhead to cache invalidation,
and keeps the entire system fairly simple.
All this is to say - we err'd on the side of keeping it simple, even if it
has a few stupid footguns. Obviously open to ideas, though.
> and we can have
> richer info. Having to chat to each qmp interface independently works fine is
> also a bit more complex than I think we would eventually want.
>
This is a small component in someone's fabric manager that translates their
requests into QMP commands. Whatever we ultimately decide on, the complexity
here is about the same.
> Having a solution in place though will make it much easier to move towards
> an eventual upstreamable solution. This is a great place to start from!
>
> Jonathan
>
> > +/*
> > + * We limit the number of heads to prevent the shared state
> > + * region from becoming a major memory hog. We need 512MB of
> > + * memory space to track 8-host ownership of 4GB of memory in
> > + * blocks of 2MB. This can change if the block size is increased.
>
> I'm lost what makes up that size?
>
I think the math is wrong here, we may have calculated based on
a larger capacity. I need to go back and look at how we came to 512MB.
~Gregory