[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pag
From: |
Fabiano Rosas |
Subject: |
Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages |
Date: |
Tue, 23 Jul 2024 17:50:24 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Tue, Jul 23, 2024 at 02:48:48PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Mon, Jul 22, 2024 at 06:20:28PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >>
>> >> > On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
>> >> >> Peter Xu <peterx@redhat.com> writes:
>> >> >>
>> >> >> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> In this v2 I took Peter's suggestion of keeping the channels'
>> >> >> >> pointers
>> >> >> >> and moving only the extra slot. The major changes are in patches 5
>> >> >> >> and
>> >> >> >> 9. Patch 3 introduces the structure:
>> >> >> >>
>> >> >> >> typedef enum {
>> >> >> >> MULTIFD_PAYLOAD_NONE,
>> >> >> >> MULTIFD_PAYLOAD_RAM,
>> >> >> >> } MultiFDPayloadType;
>> >> >> >>
>> >> >> >> struct MultiFDSendData {
>> >> >> >> MultiFDPayloadType type;
>> >> >> >> union {
>> >> >> >> MultiFDPages_t ram;
>> >> >> >> } u;
>> >> >> >> };
>> >> >> >>
>> >> >> >> I added a NONE type so we can use it to tell when the channel has
>> >> >> >> finished sending a packet, since we'll need to switch types between
>> >> >> >> clients anyway. This avoids having to introduce a 'size', or 'free'
>> >> >> >> variable.
>> >> >> >
>> >> >> > This at least looks better to me, thanks.
>> >> >> >
>> >> >> >>
>> >> >> >> WHAT'S MISSING:
>> >> >> >>
>> >> >> >> - The support for calling multifd_send() concurrently. Maciej has
>> >> >> >> this
>> >> >> >> in his series so I didn't touch it.
>> >> >> >>
>> >> >> >> - A way of adding methods for the new payload type. Currently, the
>> >> >> >> compression methods are somewhat coupled with ram migration, so
>> >> >> >> I'm
>> >> >> >> not sure how to proceed.
>> >> >> >
>> >> >> > What is this one? Why compression methods need new payload? Aren't
>> >> >> > they
>> >> >> > ram-typed?
>> >> >>
>> >> >> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods
>> >> >> are
>> >> >> either nocomp, or the compression-specific methods
>> >> >> (e.g. zlib_send_prepare).
>> >> >>
>> >> >> How do we add methods for the upcoming new payload types? I don't
>> >> >> expect
>> >> >> us to continue using nocomp and then do "if (ram)... else if
>> >> >> (device_state) ..." inside of them. I would expect us to rename
>> >> >> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
>> >> >> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
>> >> >>
>> >> >> multifd_nocomp_ops -> multifd_ram_ops // rename
>> >> >> multifd_zlib_ops // existing
>> >> >> multifd_device_ops // new
>> >> >>
>> >> >> The challenge here is that the current framework is nocomp
>> >> >> vs. compression. It needs to become ram + compression vs. other types.
>> >> >
>> >> > IMHO we can keep multifd_ops[] only for RAM. There's only
>> >> > send_prepare()
>> >> > that device state will need, and so far it's only (referring Maciej's
>> >> > code):
>> >> >
>> >> > static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
>> >> > Error **errp)
>> >> > {
>> >> > multifd_send_prepare_header_device_state(p);
>> >> >
>> >> > assert(!(p->flags & MULTIFD_FLAG_SYNC));
>> >> >
>> >> > p->next_packet_size = p->device_state->buf_len;
>> >> > if (p->next_packet_size > 0) {
>> >> > p->iov[p->iovs_num].iov_base = p->device_state->buf;
>> >> > p->iov[p->iovs_num].iov_len = p->next_packet_size;
>> >> > p->iovs_num++;
>> >> > }
>> >> >
>> >> > p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
>> >> >
>> >> > multifd_send_fill_packet_device_state(p);
>> >> >
>> >> > return 0;
>> >> > }
>> >> >
>> >> > None of other multifd_ops are used.
>> >>
>> >> There's also a conditional around device_state when calling
>> >> ->recv(). That could seems like it could go to a hook.
>> >>
>> >> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
>> >
>> > Actually that's exactly what I think is right.. it looks to me now that we
>> > could bypass anything in MultifdOps (including recv()) but let device state
>> > be a parallel layer of MultifdOps itself, leaving MultifdOps only for
>> > compressors.
>> >
>> > And yeah, I still remember you just renamed it from recv_pages() to
>> > recv().. it's just that now when think it again it looks like cleaner to
>> > make it only about pages..
>> >
>> >>
>> >> >
>> >> > I think we can directly invoke this part of device state code in
>> >> > multifd_send_thread() for now. So far I think it should be ok.
>> >>
>> >> It's not just that. There's also a check for "if (ram)" at every call to
>> >> multifd_ops to avoid calling the ram code when doing the device
>> >> migration. It would be way easier to just set noop functions for those.
>> >>
>> >> static MultiFDMethods multifd_devstate_ops = {
>> >> .send_setup = noop_send_setup,
>> >> .send_cleanup = noop_send_cleanup,
>> >> .send_prepare = devstate_send_prepare,
>> >> .recv_setup = noop_recv_setup,
>> >> .recv_cleanup = noop_recv_cleanup,
>> >> .recv = devstate_recv
>> >> };
>> >>
>> >> I'm not saying this needs to be done in this series though. But I do
>> >> think that's the correct design choice for the long term.
>> >
>> > Yes it should be separate.
>> >
>> > And what I meant is we don't need all these noops, but recv() keeps being
>> > ignored just like above, then for sender side, right now it's:
>> >
>> > ret = multifd_send_state->ops->send_prepare(p, &local_err);
>> > if (migrate_mapped_ram()) {
>> > file_write_ramblock_iov();
>> > } else {
>> > ret = qio_channel_writev_full_all();
>> > }
>> >
>> > VFIO can process device state in parallel, so:
>> >
>> > if (ram) {
>> > ret = multifd_send_state->ops->send_prepare(p, &local_err);
>> > if (migrate_mapped_ram()) {
>> > file_write_ramblock_iov();
>> > } else {
>> > qio_channel_writev_full_all();
>> > }
>> > } else {
>> > // device state handling
>> > multifd_send_device_prepare(...);
>> > ...
>> > qio_channel_writev_full_all();
>> > }
>> >
>> > Then MultifdOps doesn't apply to device states.
>>
>> To avoid getting into bikeshed territory, I think we should postpone
>> this discussion until after Maciej's series is merged, so we can speak
>> more concretely about the implications. It's easy enough to go from your
>> suggestion to mine than the other way around, so let's leave at that.
>>
>> I had it already written, so more of my reasoning below, if you're
>> interested.
>
> I never thought this is bikeshedding.. What we're discussing now is exactly
> what should appear in Maciej's code, am I right? I thought we should
> figure it out before it's merged, if that's the case..
Yeah, but it should be functionally the same, so it shouldn't matter
whether we merge a solution now or leave it until after his series.
>
> And whose suggestion isn't that important, IMO. We simply try to discuss
> this technically and reach a consensus.. no matter who proposed what.
Right, I'm just using a shorthand to avoid having to describe the
proposals every time. What I mean is it's easier to switch if/else with
function pointers than the other way around because adding the hooks
will also require some changes to the MultiFDMethods structure.
>
>> ======
>>
>> We already have the send/recv threads structured in relation to what we
>> do inside the hooks. You're just defining a function that's not a hook,
>> but it has the same signature and responsibilities and needs to be
>> called at the same moment.
>>
>> I think the dissonance here is that you don't see the multifd thread
>> code and the payloads (ram, device) as separate layers. Payload-specific
>> code should not be at top level. Otherwise, it breaks any semblance of
>> proper layering:
>>
>> - payload code will have access to MultiFD*Params, which has a bunch of
>> control variables for the loop, the semaphores, etc. that should not
>> be touched;
>>
>> - payload code ends up influencing the flow of the thread
>> function. E.g. when zero_copy_send used to dictate whether we'd have
>> separate IO for the packet or not.
>>
>> - temporary variables needed by the payload code will have to be
>> declared inside the thread funcion, which makes tempting to use them
>> across payload types and also in the thread code itself;
>>
>> - it creates doubt as to whether new changes go inside the hooks, in the
>> if/else or outside of it;
>>
>> Think about how easy it has has been to review and merge the various
>> compression features we had. It doesn't matter how much they mess up
>> inside the hooks, it will never cause the dreaded "Memory content
>> inconsistency at ..." error from check_guest_ram(). At least not in a
>> way that affects other people. Now compare that with for instance the
>> zero-page work, or even mapped-ram, that required a bunch of changes to
>> the multifd control flow itself (e.g. all of the sync changes w/
>> mapped-ram).
>
> I think there's one issue where we only support one MultiFDMethods as of
> now to be active, while the "clients" of multifd can be >1 from payload
> POV. It means I'm not sure how VFIO can provide a MultiFDMethods if it
> will overwrite what should be there to define how to process RAM..
>
> Then, we should logically allow VFIO migration to happen with RAM being
> compressed with ZSTD/ZLIB/whatever, right? The question is which
> MultiFDMethods we should assign if they're the same layer in this case..
The natural thing would be to put the hooks inside the data
type. Something like this:
struct MultiFDRecvData {
MultiFDMethods *ops; <---
void *opaque;
size_t size;
/* for preadv */
off_t file_offset;
};
struct MultiFDSendData {
MultiFDPayloadType type;
MultiFDMethods *ops; <---
union {
MultiFDPages_t ram;
} u;
};
multifd_ram_save_setup()
{
multifd_ram_send = multifd_send_data_alloc();
multifd_register_ops(multifd_ram_send, &multifd_ram_ops);
}
void multifd_register_ops(MultiFDSendData *data, MultiFDMethods *ops)
{
if (data->type == RAM) {
method = migrate_multifd_compression();
assert(0 < method && method < MULTIFD_COMPRESSION__MAX);
ops = multifd_ops[method];
}
data->ops = ops;
}
I'm just not sure whether we have some compiler cleverness optimizing
these function pointer accesses due to multifd_send_state being static
and multifd_send_state->ops being unchanged throughout the
migration. But AFAICS, the proposal above is almost the same thing as we
already have.
- Re: [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data, (continued)
Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/22
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Fabiano Rosas, 2024/07/22
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/22
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Fabiano Rosas, 2024/07/22
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/22
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Fabiano Rosas, 2024/07/23
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/23
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages,
Fabiano Rosas <=
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/23