[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/2] migration: Fix rdma migration failed
|
From: |
Juan Quintela |
|
Subject: |
Re: [PATCH v2 1/2] migration: Fix rdma migration failed |
|
Date: |
Tue, 03 Oct 2023 21:00:20 +0200 |
|
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.3 (gnu/linux) |
Peter Xu <peterx@redhat.com> wrote:
> On Tue, Sep 26, 2023 at 06:01:02PM +0800, Li Zhijian wrote:
>> Migration over RDMA failed since
>> commit: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> with erors:
>> qemu-system-x86_64: rdma: Too many requests in this message
>> (3638950032).Bailing.
>>
>> migration with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
>>
>> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
>> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic(
>> RAM_SAVE_FLAG_MULTIFD_FLUSH) to destination and cause migration to fail
>> even though multifd is disabled.
>>
>> This change make migrate_multifd_flush_after_each_section() return true
>> when multifd is disabled, that also means RAM_SAVE_FLAG_MULTIFD_FLUSH
>> will not be sent to destination any more when multifd is disabled.
>>
>> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> CC: Fabiano Rosas <farosas@suse.de>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>>
>> V2: put that check at the entry of
>> migrate_multifd_flush_after_each_section() # Peter
>
> When seeing this I notice my suggestion wasn't ideal either, as we rely on
> both multifd_send_sync_main() and multifd_recv_sync_main() be no-op when
> !multifd.
>
> For the long term, we should not call multifd functions at all, if multifd
> is not enabled..
Agreed.
Send a different patch that makes this clear.
> Reviewed-by: Peter Xu <peterx@redhat.com>