[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complet
From: |
Peter Xu |
Subject: |
Re: [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complete |
Date: |
Fri, 6 Dec 2024 09:40:56 -0500 |
On Fri, Dec 06, 2024 at 10:17:22AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> > complete()") removed the FLUSH operation on complete() which should avoid
> > one global sync on destination side, because it's not needed.
> >
> > However that commit overlooked multifd_ram_flush_and_sync() part of things,
> > as that's always used together with the FLUSH message on the main channel.
> >
> > For very old binaries (multifd_flush_after_each_section==true), that's
> > still needed because each EOS received on destination will enforce
> > all-channel sync once.
> >
> > For new binaries (multifd_flush_after_each_section==false), the flush and
> > sync shouldn't be needed just like the FLUSH, with the same reasoning.
> >
> > Currently, on new binaries we'll still send SYNC messages on multifd
> > channels, even if FLUSH is omitted at the end. It'll make all recv threads
> > hang at SYNC message.
> >
> > Multifd is still all working fine because luckily recv side cleanup
> > code (mostly multifd_recv_sync_main()) is smart enough to make sure even if
> > recv threads are stuck at SYNC it'll get kicked out. And since this is the
> > completion phase of migration, nothing else will be sent after the SYNCs.
> >
> > This may be needed for VFIO in the future because VFIO can have data to
> > push after ram_save_complete(), hence we don't want the recv thread to be
> > stuck in SYNC message. Remove this limitation will make src even slightly
> > faster too to avoid some more code.
> >
> > Stable branches do not need this patch, as no real bug I can think of that
> > will go wrong there.. so not attaching Fixes to be clear on the backport
> > not needed.
>
> You forgot my comments on the commit message, so here's a brand-new one
Yes I did.. :(
> with some things spelt out more explicitly:
>
> --
> Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at
> ram_save_complete(), because the sync on the destination side is not
> needed due to the last iteration of find_dirty_block() having already
> done it.
>
> However, that commit overlooked that multifd_ram_flush_and_sync() on the
> source side is also not needed at ram_save_complete(), for the same
> reason.
>
> Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the
> multifd_ram_flush_and_sync() means that currently the recv threads will
> hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the
> destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is
> received.
>
> Luckily, multifd is still all working fine because recv side cleanup
> code (mostly multifd_recv_sync_main()) is smart enough to make sure even
> if recv threads are stuck at SYNC it'll get kicked out. And since this
> is the completion phase of migration, nothing else will be sent after
> the SYNCs.
>
> This needs to be fixed because in the future VFIO will have data to push
> after ram_save_complete() and we don't want the recv thread to be stuck
> in the MULTIFD_FLAG_SYNC message.
>
> Remove the unnecessary (and buggy) invocation of
> multifd_ram_flush_and_sync().
>
> For very old binaries (multifd_flush_after_each_section==true), the
> flush_and_sync is still needed because each EOS received on destination
> will enforce all-channel sync once.
>
> Stable branches do not need this patch, as no real bug I can think of
> that will go wrong there.. so not attaching Fixes to be clear on the
> backport not needed.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Sure, I used it, thanks.
>
> > ---
> > migration/ram.c | 13 ++++++++++---
> > 1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 05ff9eb328..7284c34bd8 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void
> > *opaque)
> > }
> > }
> >
> > - ret = multifd_ram_flush_and_sync();
> > - if (ret < 0) {
> > - return ret;
> > + if (migrate_multifd() &&
> > + migrate_multifd_flush_after_each_section()) {
> > + /*
> > + * Only the old dest QEMU will need this sync, because each EOS
> > + * will require one SYNC message on each channel.
> > + */
> > + ret = multifd_ram_flush_and_sync();
> > + if (ret < 0) {
> > + return ret;
> > + }
> > }
> >
> > if (migrate_mapped_ram()) {
>
--
Peter Xu
[PATCH v2 2/7] migration/multifd: Allow to sync with sender threads only, Peter Xu, 2024/12/05