[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] migration/multifd: Further remove the SYNC on complete
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH 1/2] migration/multifd: Further remove the SYNC on complete |
Date: |
Thu, 05 Dec 2024 18:16:05 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Thu, Dec 05, 2024 at 04:55:08PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
>> > complete()") removed the FLUSH operation on complete() which should avoid
>> > one global sync on destination side, because it's not needed.
>> >
>> > However that commit overlooked multifd_ram_flush_and_sync() part of things,
>> > as that's always used together with the FLUSH message on the main channel.
>>
>> Let's please write the full name of the flags, functions, etc. As we've
>> seen in the discussion with Prasad, we're stumbling over ambiguous
>> terminology. This is RAM_SAVE_FLAG_MULTIFD_FLUSH.
>
> Sure.
>
>>
>> >
>> > For very old binaries (multifd_flush_after_each_section==true), that's
>> > still needed because each EOS received on destination will enforce
>> > all-channel sync once.
>>
>> Ok, but why does this patch not reinstate the flag?
>
> RAM_SAVE_FLAG_MULTIFD_FLUSH? Because it's not needed?
>
Ah, you're saying we need the source side to send the MULTIFD_FLAG_SYNC
packet so that the old QEMU on the recv side gets out of waiting. I
see.
>>
>> >
>> > For new binaries (multifd_flush_after_each_section==false), the flush and
>> > sync shouldn't be needed just like the FLUSH, with the same reasoning.
>> >
>> > Currently, on new binaries we'll still send SYNC messages on multifd
>> > channels, even if FLUSH is omitted at the end. It'll make all recv threads
>> > hang at SYNC message.
>>
>> I don't get this part, is this a bug you're describing? There's not SYNC
>> message on the recv side, I think you mean the MULTIFD_FLAG_SYNC and
>> this code?
>>
>> if (flags & MULTIFD_FLAG_SYNC) {
>> qemu_sem_post(&multifd_recv_state->sem_sync);
>> qemu_sem_wait(&p->sem_sync);
>> }
>
> Yes.
>
>>
>> That's not a hang, that's the sync.
>
> When recv side never collect that SYNC (by invoke multifd_recv_sync_main),
> then it is a hang.
>
Right, I forget the sync on the recv is the other way around. It's the
multifd_recv_state that does the sync between the threads. The sem_sync
is there so that the channels don't exit.
>>
>> >
>> > Multifd is still all working fine because luckily recv side cleanup
>> > code (mostly multifd_recv_sync_main()) is smart enough to make sure even if
>> > recv threads are stuck at SYNC it'll get kicked out. And since this is the
>> > completion phase of migration, nothing else will be sent after the SYNCs.
>>
>> Hmm, a last sync on the recv side might indeed not be needed.
>>
>> >
>> > This may be needed for VFIO in the future because VFIO can have data to
>> > push after ram_save_complete(), hence we don't want the recv thread to be
>> > stuck in SYNC message. Remove this limitation will make src even slightly
>> > faster too to avoid some more code.
>> >
>> > Stable branches do not need this patch, as no real bug I can think of that
>> > will go wrong there.. so not attaching Fixes to be clear on the backport
>> > not needed.
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> > migration/ram.c | 13 ++++++++++---
>> > 1 file changed, 10 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/migration/ram.c b/migration/ram.c
>> > index 05ff9eb328..7284c34bd8 100644
>> > --- a/migration/ram.c
>> > +++ b/migration/ram.c
>> > @@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void
>> > *opaque)
>> > }
>> > }
>> >
>> > - ret = multifd_ram_flush_and_sync();
>> > - if (ret < 0) {
>> > - return ret;
>> > + if (migrate_multifd() &&
>> > + migrate_multifd_flush_after_each_section()) {
>> > + /*
>> > + * Only the old dest QEMU will need this sync, because each EOS
>> > + * will require one SYNC message on each channel.
>> > + */
>> > + ret = multifd_ram_flush_and_sync();
>> > + if (ret < 0) {
>> > + return ret;
>> > + }
>>
>> I don't think this works. We need one last flush to not lose the last
>> few pages of ram. And we need the src side sync of multifd threads to
>> make sure this function doesn't return before all IO has been put on the
>> wire.
>
> This should be the question for commit 637280aeb2, I thought it got
> answered there.. It's almost what the commit message there in 637280aeb2
> wanted to justify too.
Yeah, it missed the mark.
>
> We don't need to flush the last pages here, because we flushed it already,
> in the last find_dirty_block() call where src QEMU finished scanning the
> last round. Then we set complete_round=true, scan one more round, and the
> iteration won't complete until the new round sees zero dirty page.
Ok, let's put this in the commit message. As it stands it looks like
this patch is fixing a bug with 637280aeb2 when instead it's doing
further optimization, but so it happens that we still require the
backward compatibility part.
>
> So when reaching this line, multifd send buffer must be empty. We need to
> flush for each round of RAM scan, we can't avoid the flush there, but we
> can save this one in complete().
>
> To explain that with code, I hacked a QEMU and assert that. It ran all
> fine here (this is definitely not for merge.. but to show what I meant):
>
> ===8<===
> diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> index f64c4c9abd..0bd42c7627 100644
> --- a/migration/multifd-nocomp.c
> +++ b/migration/multifd-nocomp.c
> @@ -21,7 +21,7 @@
> #include "qemu/error-report.h"
> #include "trace.h"
>
> -static MultiFDSendData *multifd_ram_send;
> +MultiFDSendData *multifd_ram_send;
>
> size_t multifd_ram_payload_size(void)
> {
> diff --git a/migration/ram.c b/migration/ram.c
> index 7284c34bd8..edeb9e28ff 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3228,6 +3228,8 @@ out:
> return done;
> }
>
> +extern MultiFDSendData *multifd_ram_send;
> +
> /**
> * ram_save_complete: function called to send the remaining amount of ram
> *
> @@ -3283,6 +3285,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> }
> }
>
> + if (migrate_multifd()) {
> + assert(multifd_payload_empty(multifd_ram_send));
> + }
> +
> if (migrate_multifd() &&
> migrate_multifd_flush_after_each_section()) {
> /*
> ===8<===
>
>>
>> This also doesn't address what the commit message says about the recv
>> side never getting the RAM_SAVE_FLAG_MULTIFD_FLUSH message.
>
> The new binaries now always not send RAM_SAVE_FLAG_MULTIFD_FLUSH when
> complete(), however it always sends the multifd SYNC messages on the wire.
> It means those SYNC messages will always arrive on dest multifd channels,
> and then all these channels will wait for the main thread to collect that.
> However since RAM_SAVE_FLAG_MULTIFD_FLUSH is not there, they'll hang until
> multifd recv cleanups.
[PATCH 2/2] migration/multifd: Allow to sync with sender threads only, Peter Xu, 2024/12/05