qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 7/7] migration/multifd: Document the reason to sync for sa


From: Peter Xu
Subject: Re: [PATCH v2 7/7] migration/multifd: Document the reason to sync for save_setup()
Date: Fri, 6 Dec 2024 10:36:18 -0500

On Fri, Dec 06, 2024 at 11:40:27AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > It's not straightforward to see why src QEMU needs to sync multifd during
> > setup() phase.  After all, there's no page queued at that point.
> >
> > For old QEMUs, there's a solid reason: EOS requires it to work.  While it's
> > clueless on the new QEMUs which do not take EOS message as sync requests.
> >
> > One will figure that out only when this is conditionally removed.  In fact,
> > the author did try it out.  Logically we could still avoid doing this on
> > new machine types, however that needs a separate compat field and that can
> > be an overkill in some trivial overhead in setup() phase.
> >
> > Let's instead document it completely, to avoid someone else tries this
> > again and do the debug one more time, or anyone confused on why this ever
> > existed.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/ram.c | 27 +++++++++++++++++++++++++++
> >  1 file changed, 27 insertions(+)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 5d4bdefe69..ddee703585 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3036,6 +3036,33 @@ static int ram_save_setup(QEMUFile *f, void *opaque, 
> > Error **errp)
> >          migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> >      }
> >  
> > +    /*
> > +     * This operation is unfortunate..
> > +     *
> > +     * For legacy QEMUs using per-section sync
> > +     * =======================================
> > +     *
> > +     * This must exist because the EOS below requires the SYNC messages
> > +     * per-channel to work.
> > +     *
> > +     * For modern QEMUs using per-round sync
> > +     * =====================================
> > +     *
> > +     * Logically this sync is not needed (because we know there's nothing
> > +     * in the multifd queue yet!).
> 
> This is a bit misleading because even today we could split the
> multifd_ram_flush_and_sync() into _flush and _sync (haven't I seen a
> patch doing this somewhere? Maybe from Maciej...) and call just the
> _sync here, which is unrelated to any multifd queue.

Yeah you have a point, maybe at least I shouldn't mention the queues, they
can be irrelevant.

> 
> I think we shouldn't tie "sync" with "wait for multifd threads to finish
> sending their data (a kind of flush)" as this implies. The sync is as
> much making sure the threads are ready to receive as it is making sure
> the data is received in order with relation to ram scanning rounds.
> 
> IOW, the local sync is what ensures multifd send threads are flushed
> while this code deals with the sync of src&dst threads, which is "just"
> a synchronization point between the two QEMUs.

This is a remote sync, not a local sync.  But yes, it can be the
synchronization point.

As I mentioned below, such sync point has nothing to do with src, so it can
be implemented in dest alone without such sync message.  It works, though.

> 
> > However as a side effect, this makes
> > +     * sure the dest side won't receive any data before it properly reaches
> > +     * ram_load_precopy().
> 
> I'm not sure it's a side-effect. It seems deliberate to me, seeing that
> multifd usually does its own synchronization. For instance, on the send
> side we also need some sync to make sure ram.c doesn't send data to
> multifd send threads that are not ready yet (i.e. the wait on
> channels_ready at the start of multifd_send()).

Yes, and that's exactly what I wanted to express.  If dest has that
"channels_ready" thing, I'm pretty sure we don't need this remote sync.
We're using a heavier sync to service the purpose for "local sync for
dest".  It's ok, but it's very unclear on what it really does.

> 
> > +     *
> > +     * Without this sync, src QEMU can send data too soon so that dest may
> > +     * not have been ready to receive it (e.g., rb->receivedmap may be
> > +     * uninitialized, for example).
> > +     *
> > +     * Logically "wait for recv setup ready" shouldn't need to involve src
> > +     * QEMU at all, however to be compatible with old QEMUs, let's stick
> 
> I don't understand this statement, you're saying that QEMU src could
> just start dumping data on the channel without a remote end? Certainly
> for file migrations, but socket as well?

Yes.

When reaching here on sender side, all multifd channels are already there:
multifd_send_setup() guaranteed it.  We can start dump things, AFAICT,
irrelevant of what dest is doing.

Maybe the recv threads are not even created, but that isn't relevant, IMO,
as long as they'll be there at some point and start collecting socket
buffers.

> 
> > +     * with this.  Fortunately the overhead is low to sync during setup
> > +     * because the VM is running, so at least it's not accounted as part of
> > +     * downtime.
> > +     */
> >      bql_unlock();
> >      ret = multifd_ram_flush_and_sync(f);
> >      bql_lock();
> 

I removed ambiguous wordings on "queue is empty", and simplified it a bit.
How's this one look?

    /*
     * This operation is unfortunate..
     *
     * For legacy QEMUs using per-section sync
     * =======================================
     *
     * This must exist because the EOS below requires the SYNC messages
     * per-channel to work.
     *
     * For modern QEMUs using per-round sync
     * =====================================
     *
     * Logically such sync is not needed, and recv threads should not run
     * until setup ready (using things like channels_ready on src).  Then
     * we should be all fine.
     *
     * However even if we add channels_ready to recv side in new QEMUs, old
     * QEMU won't have them so this sync will still be needed to make sure
     * multifd recv threads won't start processing guest pages early before
     * ram_load_setup() is properly done.
     *
     * Let's stick with this.  Fortunately the overhead is low to sync
     * during setup because the VM is running, so at least it's not
     * accounted as part of downtime.
     */

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]