qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complet


From: Peter Xu
Subject: Re: [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complete
Date: Fri, 6 Dec 2024 09:40:56 -0500

On Fri, Dec 06, 2024 at 10:17:22AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> > complete()") removed the FLUSH operation on complete() which should avoid
> > one global sync on destination side, because it's not needed.
> >
> > However that commit overlooked multifd_ram_flush_and_sync() part of things,
> > as that's always used together with the FLUSH message on the main channel.
> >
> > For very old binaries (multifd_flush_after_each_section==true), that's
> > still needed because each EOS received on destination will enforce
> > all-channel sync once.
> >
> > For new binaries (multifd_flush_after_each_section==false), the flush and
> > sync shouldn't be needed just like the FLUSH, with the same reasoning.
> >
> > Currently, on new binaries we'll still send SYNC messages on multifd
> > channels, even if FLUSH is omitted at the end.  It'll make all recv threads
> > hang at SYNC message.
> >
> > Multifd is still all working fine because luckily recv side cleanup
> > code (mostly multifd_recv_sync_main()) is smart enough to make sure even if
> > recv threads are stuck at SYNC it'll get kicked out.  And since this is the
> > completion phase of migration, nothing else will be sent after the SYNCs.
> >
> > This may be needed for VFIO in the future because VFIO can have data to
> > push after ram_save_complete(), hence we don't want the recv thread to be
> > stuck in SYNC message. Remove this limitation will make src even slightly
> > faster too to avoid some more code.
> >
> > Stable branches do not need this patch, as no real bug I can think of that
> > will go wrong there.. so not attaching Fixes to be clear on the backport
> > not needed.
> 
> You forgot my comments on the commit message, so here's a brand-new one

Yes I did.. :(

> with some things spelt out more explicitly:
> 
> --
> Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at
> ram_save_complete(), because the sync on the destination side is not
> needed due to the last iteration of find_dirty_block() having already
> done it.
> 
> However, that commit overlooked that multifd_ram_flush_and_sync() on the
> source side is also not needed at ram_save_complete(), for the same
> reason.
> 
> Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the
> multifd_ram_flush_and_sync() means that currently the recv threads will
> hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the
> destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is
> received.
> 
> Luckily, multifd is still all working fine because recv side cleanup
> code (mostly multifd_recv_sync_main()) is smart enough to make sure even
> if recv threads are stuck at SYNC it'll get kicked out. And since this
> is the completion phase of migration, nothing else will be sent after
> the SYNCs.
> 
> This needs to be fixed because in the future VFIO will have data to push
> after ram_save_complete() and we don't want the recv thread to be stuck
> in the MULTIFD_FLAG_SYNC message.
> 
> Remove the unnecessary (and buggy) invocation of
> multifd_ram_flush_and_sync().
> 
> For very old binaries (multifd_flush_after_each_section==true), the
> flush_and_sync is still needed because each EOS received on destination
> will enforce all-channel sync once.
> 
> Stable branches do not need this patch, as no real bug I can think of
> that will go wrong there.. so not attaching Fixes to be clear on the
> backport not needed.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

Sure, I used it, thanks.

> 
> > ---
> >  migration/ram.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 05ff9eb328..7284c34bd8 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void 
> > *opaque)
> >          }
> >      }
> >  
> > -    ret = multifd_ram_flush_and_sync();
> > -    if (ret < 0) {
> > -        return ret;
> > +    if (migrate_multifd() &&
> > +        migrate_multifd_flush_after_each_section()) {
> > +        /*
> > +         * Only the old dest QEMU will need this sync, because each EOS
> > +         * will require one SYNC message on each channel.
> > +         */
> > +        ret = multifd_ram_flush_and_sync();
> > +        if (ret < 0) {
> > +            return ret;
> > +        }
> >      }
> >  
> >      if (migrate_mapped_ram()) {
> 

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]