[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 7/7] migration/multifd: Document the reason to sync for sa
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH v2 7/7] migration/multifd: Document the reason to sync for save_setup() |
Date: |
Fri, 06 Dec 2024 11:40:27 -0300 |
Peter Xu <peterx@redhat.com> writes:
> It's not straightforward to see why src QEMU needs to sync multifd during
> setup() phase. After all, there's no page queued at that point.
>
> For old QEMUs, there's a solid reason: EOS requires it to work. While it's
> clueless on the new QEMUs which do not take EOS message as sync requests.
>
> One will figure that out only when this is conditionally removed. In fact,
> the author did try it out. Logically we could still avoid doing this on
> new machine types, however that needs a separate compat field and that can
> be an overkill in some trivial overhead in setup() phase.
>
> Let's instead document it completely, to avoid someone else tries this
> again and do the debug one more time, or anyone confused on why this ever
> existed.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/ram.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 5d4bdefe69..ddee703585 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3036,6 +3036,33 @@ static int ram_save_setup(QEMUFile *f, void *opaque,
> Error **errp)
> migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> }
>
> + /*
> + * This operation is unfortunate..
> + *
> + * For legacy QEMUs using per-section sync
> + * =======================================
> + *
> + * This must exist because the EOS below requires the SYNC messages
> + * per-channel to work.
> + *
> + * For modern QEMUs using per-round sync
> + * =====================================
> + *
> + * Logically this sync is not needed (because we know there's nothing
> + * in the multifd queue yet!).
This is a bit misleading because even today we could split the
multifd_ram_flush_and_sync() into _flush and _sync (haven't I seen a
patch doing this somewhere? Maybe from Maciej...) and call just the
_sync here, which is unrelated to any multifd queue.
I think we shouldn't tie "sync" with "wait for multifd threads to finish
sending their data (a kind of flush)" as this implies. The sync is as
much making sure the threads are ready to receive as it is making sure
the data is received in order with relation to ram scanning rounds.
IOW, the local sync is what ensures multifd send threads are flushed
while this code deals with the sync of src&dst threads, which is "just"
a synchronization point between the two QEMUs.
> However as a side effect, this makes
> + * sure the dest side won't receive any data before it properly reaches
> + * ram_load_precopy().
I'm not sure it's a side-effect. It seems deliberate to me, seeing that
multifd usually does its own synchronization. For instance, on the send
side we also need some sync to make sure ram.c doesn't send data to
multifd send threads that are not ready yet (i.e. the wait on
channels_ready at the start of multifd_send()).
> + *
> + * Without this sync, src QEMU can send data too soon so that dest may
> + * not have been ready to receive it (e.g., rb->receivedmap may be
> + * uninitialized, for example).
> + *
> + * Logically "wait for recv setup ready" shouldn't need to involve src
> + * QEMU at all, however to be compatible with old QEMUs, let's stick
I don't understand this statement, you're saying that QEMU src could
just start dumping data on the channel without a remote end? Certainly
for file migrations, but socket as well?
> + * with this. Fortunately the overhead is low to sync during setup
> + * because the VM is running, so at least it's not accounted as part of
> + * downtime.
> + */
> bql_unlock();
> ret = multifd_ram_flush_and_sync(f);
> bql_lock();
- Re: [PATCH v2 2/7] migration/multifd: Allow to sync with sender threads only, (continued)
[PATCH v2 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages, Peter Xu, 2024/12/05
[PATCH v2 5/7] migration/multifd: Remove sync processing on postcopy, Peter Xu, 2024/12/05
[PATCH v2 6/7] migration/multifd: Cleanup src flushes on condition check, Peter Xu, 2024/12/05
[PATCH v2 7/7] migration/multifd: Document the reason to sync for save_setup(), Peter Xu, 2024/12/05
- Re: [PATCH v2 7/7] migration/multifd: Document the reason to sync for save_setup(),
Fabiano Rosas <=