qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] migration/multifd: Further remove the SYNC on complete


From: Fabiano Rosas
Subject: Re: [PATCH 1/2] migration/multifd: Further remove the SYNC on complete
Date: Thu, 05 Dec 2024 18:16:05 -0300

Peter Xu <peterx@redhat.com> writes:

> On Thu, Dec 05, 2024 at 04:55:08PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
>> > complete()") removed the FLUSH operation on complete() which should avoid
>> > one global sync on destination side, because it's not needed.
>> >
>> > However that commit overlooked multifd_ram_flush_and_sync() part of things,
>> > as that's always used together with the FLUSH message on the main channel.
>> 
>> Let's please write the full name of the flags, functions, etc. As we've
>> seen in the discussion with Prasad, we're stumbling over ambiguous
>> terminology. This is RAM_SAVE_FLAG_MULTIFD_FLUSH.
>
> Sure.
>
>> 
>> >
>> > For very old binaries (multifd_flush_after_each_section==true), that's
>> > still needed because each EOS received on destination will enforce
>> > all-channel sync once.
>> 
>> Ok, but why does this patch not reinstate the flag?
>
> RAM_SAVE_FLAG_MULTIFD_FLUSH?  Because it's not needed?
>

Ah, you're saying we need the source side to send the MULTIFD_FLAG_SYNC
packet so that the old QEMU on the recv side gets out of waiting. I
see.

>> 
>> >
>> > For new binaries (multifd_flush_after_each_section==false), the flush and
>> > sync shouldn't be needed just like the FLUSH, with the same reasoning.
>> >
>> > Currently, on new binaries we'll still send SYNC messages on multifd
>> > channels, even if FLUSH is omitted at the end.  It'll make all recv threads
>> > hang at SYNC message.
>> 
>> I don't get this part, is this a bug you're describing? There's not SYNC
>> message on the recv side, I think you mean the MULTIFD_FLAG_SYNC and
>> this code?
>> 
>>     if (flags & MULTIFD_FLAG_SYNC) {
>>         qemu_sem_post(&multifd_recv_state->sem_sync);
>>         qemu_sem_wait(&p->sem_sync);
>>     }
>
> Yes.
>
>> 
>> That's not a hang, that's the sync.
>
> When recv side never collect that SYNC (by invoke multifd_recv_sync_main),
> then it is a hang.
>

Right, I forget the sync on the recv is the other way around. It's the
multifd_recv_state that does the sync between the threads. The sem_sync
is there so that the channels don't exit.

>> 
>> >
>> > Multifd is still all working fine because luckily recv side cleanup
>> > code (mostly multifd_recv_sync_main()) is smart enough to make sure even if
>> > recv threads are stuck at SYNC it'll get kicked out.  And since this is the
>> > completion phase of migration, nothing else will be sent after the SYNCs.
>> 
>> Hmm, a last sync on the recv side might indeed not be needed.
>> 
>> >
>> > This may be needed for VFIO in the future because VFIO can have data to
>> > push after ram_save_complete(), hence we don't want the recv thread to be
>> > stuck in SYNC message. Remove this limitation will make src even slightly
>> > faster too to avoid some more code.
>> >
>> > Stable branches do not need this patch, as no real bug I can think of that
>> > will go wrong there.. so not attaching Fixes to be clear on the backport
>> > not needed.
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  migration/ram.c | 13 ++++++++++---
>> >  1 file changed, 10 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/migration/ram.c b/migration/ram.c
>> > index 05ff9eb328..7284c34bd8 100644
>> > --- a/migration/ram.c
>> > +++ b/migration/ram.c
>> > @@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void 
>> > *opaque)
>> >          }
>> >      }
>> >  
>> > -    ret = multifd_ram_flush_and_sync();
>> > -    if (ret < 0) {
>> > -        return ret;
>> > +    if (migrate_multifd() &&
>> > +        migrate_multifd_flush_after_each_section()) {
>> > +        /*
>> > +         * Only the old dest QEMU will need this sync, because each EOS
>> > +         * will require one SYNC message on each channel.
>> > +         */
>> > +        ret = multifd_ram_flush_and_sync();
>> > +        if (ret < 0) {
>> > +            return ret;
>> > +        }
>> 
>> I don't think this works. We need one last flush to not lose the last
>> few pages of ram. And we need the src side sync of multifd threads to
>> make sure this function doesn't return before all IO has been put on the
>> wire.
>
> This should be the question for commit 637280aeb2, I thought it got
> answered there.. It's almost what the commit message there in 637280aeb2
> wanted to justify too.

Yeah, it missed the mark.

>
> We don't need to flush the last pages here, because we flushed it already,
> in the last find_dirty_block() call where src QEMU finished scanning the
> last round.  Then we set complete_round=true, scan one more round, and the
> iteration won't complete until the new round sees zero dirty page.

Ok, let's put this in the commit message. As it stands it looks like
this patch is fixing a bug with 637280aeb2 when instead it's doing
further optimization, but so it happens that we still require the
backward compatibility part.

>
> So when reaching this line, multifd send buffer must be empty.  We need to
> flush for each round of RAM scan, we can't avoid the flush there, but we
> can save this one in complete().
>
> To explain that with code, I hacked a QEMU and assert that.  It ran all
> fine here (this is definitely not for merge.. but to show what I meant):
>
> ===8<===
> diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> index f64c4c9abd..0bd42c7627 100644
> --- a/migration/multifd-nocomp.c
> +++ b/migration/multifd-nocomp.c
> @@ -21,7 +21,7 @@
>  #include "qemu/error-report.h"
>  #include "trace.h"
>
> -static MultiFDSendData *multifd_ram_send;
> +MultiFDSendData *multifd_ram_send;
>
>  size_t multifd_ram_payload_size(void)
>  {
> diff --git a/migration/ram.c b/migration/ram.c
> index 7284c34bd8..edeb9e28ff 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3228,6 +3228,8 @@ out:
>      return done;
>  }
>
> +extern MultiFDSendData *multifd_ram_send;
> +
>  /**
>   * ram_save_complete: function called to send the remaining amount of ram
>   *
> @@ -3283,6 +3285,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>          }
>      }
>
> +    if (migrate_multifd()) {
> +        assert(multifd_payload_empty(multifd_ram_send));
> +    }
> +
>      if (migrate_multifd() &&
>          migrate_multifd_flush_after_each_section()) {
>          /*
> ===8<===
>
>> 
>> This also doesn't address what the commit message says about the recv
>> side never getting the RAM_SAVE_FLAG_MULTIFD_FLUSH message.
>
> The new binaries now always not send RAM_SAVE_FLAG_MULTIFD_FLUSH when
> complete(), however it always sends the multifd SYNC messages on the wire.
> It means those SYNC messages will always arrive on dest multifd channels,
> and then all these channels will wait for the main thread to collect that.
> However since RAM_SAVE_FLAG_MULTIFD_FLUSH is not there, they'll hang until
> multifd recv cleanups.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]