[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 7/7] migration/multifd: Document the reason to sync for sa
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH v2 7/7] migration/multifd: Document the reason to sync for save_setup() |
Date: |
Fri, 06 Dec 2024 14:01:01 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Fri, Dec 06, 2024 at 11:40:27AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > It's not straightforward to see why src QEMU needs to sync multifd during
>> > setup() phase. After all, there's no page queued at that point.
>> >
>> > For old QEMUs, there's a solid reason: EOS requires it to work. While it's
>> > clueless on the new QEMUs which do not take EOS message as sync requests.
>> >
>> > One will figure that out only when this is conditionally removed. In fact,
>> > the author did try it out. Logically we could still avoid doing this on
>> > new machine types, however that needs a separate compat field and that can
>> > be an overkill in some trivial overhead in setup() phase.
>> >
>> > Let's instead document it completely, to avoid someone else tries this
>> > again and do the debug one more time, or anyone confused on why this ever
>> > existed.
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> > migration/ram.c | 27 +++++++++++++++++++++++++++
>> > 1 file changed, 27 insertions(+)
>> >
>> > diff --git a/migration/ram.c b/migration/ram.c
>> > index 5d4bdefe69..ddee703585 100644
>> > --- a/migration/ram.c
>> > +++ b/migration/ram.c
>> > @@ -3036,6 +3036,33 @@ static int ram_save_setup(QEMUFile *f, void
>> > *opaque, Error **errp)
>> > migration_ops->ram_save_target_page = ram_save_target_page_legacy;
>> > }
>> >
>> > + /*
>> > + * This operation is unfortunate..
>> > + *
>> > + * For legacy QEMUs using per-section sync
>> > + * =======================================
>> > + *
>> > + * This must exist because the EOS below requires the SYNC messages
>> > + * per-channel to work.
>> > + *
>> > + * For modern QEMUs using per-round sync
>> > + * =====================================
>> > + *
>> > + * Logically this sync is not needed (because we know there's nothing
>> > + * in the multifd queue yet!).
>>
>> This is a bit misleading because even today we could split the
>> multifd_ram_flush_and_sync() into _flush and _sync (haven't I seen a
>> patch doing this somewhere? Maybe from Maciej...) and call just the
>> _sync here, which is unrelated to any multifd queue.
>
> Yeah you have a point, maybe at least I shouldn't mention the queues, they
> can be irrelevant.
>
>>
>> I think we shouldn't tie "sync" with "wait for multifd threads to finish
>> sending their data (a kind of flush)" as this implies. The sync is as
>> much making sure the threads are ready to receive as it is making sure
>> the data is received in order with relation to ram scanning rounds.
>>
>> IOW, the local sync is what ensures multifd send threads are flushed
>> while this code deals with the sync of src&dst threads, which is "just"
>> a synchronization point between the two QEMUs.
>
> This is a remote sync, not a local sync. But yes, it can be the
> synchronization point.
>
> As I mentioned below, such sync point has nothing to do with src, so it can
> be implemented in dest alone without such sync message. It works, though.
>
>>
>> > However as a side effect, this makes
>> > + * sure the dest side won't receive any data before it properly
>> > reaches
>> > + * ram_load_precopy().
>>
>> I'm not sure it's a side-effect. It seems deliberate to me, seeing that
>> multifd usually does its own synchronization. For instance, on the send
>> side we also need some sync to make sure ram.c doesn't send data to
>> multifd send threads that are not ready yet (i.e. the wait on
>> channels_ready at the start of multifd_send()).
>
> Yes, and that's exactly what I wanted to express. If dest has that
> "channels_ready" thing, I'm pretty sure we don't need this remote sync.
> We're using a heavier sync to service the purpose for "local sync for
> dest". It's ok, but it's very unclear on what it really does.
>
>>
>> > + *
>> > + * Without this sync, src QEMU can send data too soon so that dest may
>> > + * not have been ready to receive it (e.g., rb->receivedmap may be
>> > + * uninitialized, for example).
>> > + *
>> > + * Logically "wait for recv setup ready" shouldn't need to involve src
>> > + * QEMU at all, however to be compatible with old QEMUs, let's stick
>>
>> I don't understand this statement, you're saying that QEMU src could
>> just start dumping data on the channel without a remote end? Certainly
>> for file migrations, but socket as well?
>
> Yes.
>
> When reaching here on sender side, all multifd channels are already there:
> multifd_send_setup() guaranteed it. We can start dump things, AFAICT,
> irrelevant of what dest is doing.
>
> Maybe the recv threads are not even created, but that isn't relevant, IMO,
> as long as they'll be there at some point and start collecting socket
> buffers.
>
>>
>> > + * with this. Fortunately the overhead is low to sync during setup
>> > + * because the VM is running, so at least it's not accounted as part
>> > of
>> > + * downtime.
>> > + */
>> > bql_unlock();
>> > ret = multifd_ram_flush_and_sync(f);
>> > bql_lock();
>>
>
> I removed ambiguous wordings on "queue is empty", and simplified it a bit.
> How's this one look?
>
> /*
> * This operation is unfortunate..
> *
> * For legacy QEMUs using per-section sync
> * =======================================
> *
> * This must exist because the EOS below requires the SYNC messages
> * per-channel to work.
> *
> * For modern QEMUs using per-round sync
> * =====================================
> *
> * Logically such sync is not needed, and recv threads should not run
> * until setup ready (using things like channels_ready on src). Then
> * we should be all fine.
> *
> * However even if we add channels_ready to recv side in new QEMUs, old
> * QEMU won't have them so this sync will still be needed to make sure
> * multifd recv threads won't start processing guest pages early before
> * ram_load_setup() is properly done.
> *
> * Let's stick with this. Fortunately the overhead is low to sync
> * during setup because the VM is running, so at least it's not
> * accounted as part of downtime.
> */
Looks ok, thanks!
- [PATCH v2 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages, (continued)