Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pag

From:	Fabiano Rosas
Subject:	Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
Date:	Tue, 23 Jul 2024 14:48:48 -0300

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 06:20:28PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> In this v2 I took Peter's suggestion of keeping the channels' pointers
>> >> >> and moving only the extra slot. The major changes are in patches 5 and
>> >> >> 9. Patch 3 introduces the structure:
>> >> >> 
>> >> >> typedef enum {
>> >> >>     MULTIFD_PAYLOAD_NONE,
>> >> >>     MULTIFD_PAYLOAD_RAM,
>> >> >> } MultiFDPayloadType;
>> >> >> 
>> >> >> struct MultiFDSendData {
>> >> >>     MultiFDPayloadType type;
>> >> >>     union {
>> >> >>         MultiFDPages_t ram;
>> >> >>     } u;
>> >> >> };
>> >> >> 
>> >> >> I added a NONE type so we can use it to tell when the channel has
>> >> >> finished sending a packet, since we'll need to switch types between
>> >> >> clients anyway. This avoids having to introduce a 'size', or 'free'
>> >> >> variable.
>> >> >
>> >> > This at least looks better to me, thanks.
>> >> >
>> >> >> 
>> >> >> WHAT'S MISSING:
>> >> >> 
>> >> >> - The support for calling multifd_send() concurrently. Maciej has this
>> >> >>   in his series so I didn't touch it.
>> >> >> 
>> >> >> - A way of adding methods for the new payload type. Currently, the
>> >> >>   compression methods are somewhat coupled with ram migration, so I'm
>> >> >>   not sure how to proceed.
>> >> >
>> >> > What is this one?  Why compression methods need new payload?  Aren't 
>> >> > they
>> >> > ram-typed?
>> >> 
>> >> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
>> >> either nocomp, or the compression-specific methods
>> >> (e.g. zlib_send_prepare).
>> >> 
>> >> How do we add methods for the upcoming new payload types? I don't expect
>> >> us to continue using nocomp and then do "if (ram)... else if
>> >> (device_state) ..." inside of them. I would expect us to rename
>> >> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
>> >> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
>> >> 
>> >> multifd_nocomp_ops -> multifd_ram_ops // rename
>> >> multifd_zlib_ops   // existing
>> >> multifd_device_ops // new
>> >> 
>> >> The challenge here is that the current framework is nocomp
>> >> vs. compression. It needs to become ram + compression vs. other types.
>> >
>> > IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
>> > that device state will need, and so far it's only (referring Maciej's
>> > code):
>> >
>> > static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
>> >                                             Error **errp)
>> > {
>> >     multifd_send_prepare_header_device_state(p);
>> >
>> >     assert(!(p->flags & MULTIFD_FLAG_SYNC));
>> >
>> >     p->next_packet_size = p->device_state->buf_len;
>> >     if (p->next_packet_size > 0) {
>> >         p->iov[p->iovs_num].iov_base = p->device_state->buf;
>> >         p->iov[p->iovs_num].iov_len = p->next_packet_size;
>> >         p->iovs_num++;
>> >     }
>> >
>> >     p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
>> >
>> >     multifd_send_fill_packet_device_state(p);
>> >
>> >     return 0;
>> > }
>> >
>> > None of other multifd_ops are used.
>> 
>> There's also a conditional around device_state when calling
>> ->recv(). That could seems like it could go to a hook.
>> 
>> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
>
> Actually that's exactly what I think is right.. it looks to me now that we
> could bypass anything in MultifdOps (including recv()) but let device state
> be a parallel layer of MultifdOps itself, leaving MultifdOps only for
> compressors.
>
> And yeah, I still remember you just renamed it from recv_pages() to
> recv()..  it's just that now when think it again it looks like cleaner to
> make it only about pages..
>
>> 
>> >
>> > I think we can directly invoke this part of device state code in
>> > multifd_send_thread() for now.  So far I think it should be ok.
>> 
>> It's not just that. There's also a check for "if (ram)" at every call to
>> multifd_ops to avoid calling the ram code when doing the device
>> migration. It would be way easier to just set noop functions for those.
>> 
>> static MultiFDMethods multifd_devstate_ops = {
>>     .send_setup = noop_send_setup,
>>     .send_cleanup = noop_send_cleanup,
>>     .send_prepare = devstate_send_prepare,
>>     .recv_setup = noop_recv_setup,
>>     .recv_cleanup = noop_recv_cleanup,
>>     .recv = devstate_recv
>> };
>> 
>> I'm not saying this needs to be done in this series though. But I do
>> think that's the correct design choice for the long term.
>
> Yes it should be separate.
>
> And what I meant is we don't need all these noops, but recv() keeps being
> ignored just like above, then for sender side, right now it's:
>
>             ret = multifd_send_state->ops->send_prepare(p, &local_err);
>             if (migrate_mapped_ram()) {
>                 file_write_ramblock_iov();
>             } else {
>                 ret = qio_channel_writev_full_all();
>             }
>
> VFIO can process device state in parallel, so:
>
>     if (ram) {
>         ret = multifd_send_state->ops->send_prepare(p, &local_err);
>         if (migrate_mapped_ram()) {
>                 file_write_ramblock_iov();
>         } else {
>                 qio_channel_writev_full_all();
>         }
>     } else {
>         // device state handling
>         multifd_send_device_prepare(...);
>         ...
>         qio_channel_writev_full_all();
>     }
>
> Then MultifdOps doesn't apply to device states.

To avoid getting into bikeshed territory, I think we should postpone
this discussion until after Maciej's series is merged, so we can speak
more concretely about the implications. It's easy enough to go from your
suggestion to mine than the other way around, so let's leave at that.

I had it already written, so more of my reasoning below, if you're
interested.
======

We already have the send/recv threads structured in relation to what we
do inside the hooks. You're just defining a function that's not a hook,
but it has the same signature and responsibilities and needs to be
called at the same moment.

I think the dissonance here is that you don't see the multifd thread
code and the payloads (ram, device) as separate layers. Payload-specific
code should not be at top level. Otherwise, it breaks any semblance of
proper layering:

- payload code will have access to MultiFD*Params, which has a bunch of
  control variables for the loop, the semaphores, etc. that should not
  be touched;

- payload code ends up influencing the flow of the thread
  function. E.g. when zero_copy_send used to dictate whether we'd have
  separate IO for the packet or not.

- temporary variables needed by the payload code will have to be
  declared inside the thread funcion, which makes tempting to use them
  across payload types and also in the thread code itself;

- it creates doubt as to whether new changes go inside the hooks, in the
  if/else or outside of it;

Think about how easy it has has been to review and merge the various
compression features we had. It doesn't matter how much they mess up
inside the hooks, it will never cause the dreaded "Memory content
inconsistency at ..." error from check_guest_ram(). At least not in a
way that affects other people. Now compare that with for instance the
zero-page work, or even mapped-ram, that required a bunch of changes to
the multifd control flow itself (e.g. all of the sync changes w/
mapped-ram).

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC, (continued)
- [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data, Fabiano Rosas, 2024/07/22
  - Re: [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data, Peter Xu, 2024/07/22
    - Re: [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data, Fabiano Rosas, 2024/07/22
    - Re: [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data, Peter Xu, 2024/07/22
- Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/22
  - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Fabiano Rosas, 2024/07/22
    - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/22
    - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Fabiano Rosas, 2024/07/22
    - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/22
    - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Fabiano Rosas <=
    - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/23
    - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Fabiano Rosas, 2024/07/23
    - Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages, Peter Xu, 2024/07/23

Prev by Date: Re: [PATCH] system/physmem: Where we assume we have a RAM MR, assert it
Next by Date: Re: [PATCH 0/2] Postcopy migration and vhost-user errors
Previous by thread: Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
Next by thread: Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
Index(es):
- Date
- Thread