qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/6] migration: Kick postcopy threads on cancel


From: Fabiano Rosas
Subject: Re: [PATCH 2/6] migration: Kick postcopy threads on cancel
Date: Wed, 04 Dec 2024 16:02:36 -0300

Peter Xu <peterx@redhat.com> writes:

> On Mon, Dec 02, 2024 at 07:01:33PM -0300, Fabiano Rosas wrote:
>> Make sure postcopy threads are released when migrate_cancel is
>> issued. Kick the postcopy_pause semaphore and have the fault thread
>> read 'fault_thread_quit' when joining.
>> 
>> While here fix the comment mentioning userfault_event_fd.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> I remember when working on postcopy, I thought about failing migrate-cancel
> for postcopy in general, rejecting such request.  And when working on the
> recover feature, there's no concern on having it being cancelled, because
> the user really shouldn't do that..
>
> The problem is migrate-cancel means crashing the VM on both sides when QEMU
> already goes into postcopy stage.

Well, that's the sillyness of having a cancel command, you can never
know what "cancel" means. The "documentation" says: "Cancel the current
executing migration process", it doesn't mention anything about the
consequences of such action.

>
> If the user wants to crash the VM anyway, an easier way to do is killing on
> both sides.

I don't think this is fair. We expose a "cancel" command, we better do
some cancelling or instead reject the command appropriately, not expect
the user to "know better".

>
> If the user wished to cancel, we should tell them "postcopy cannot be
> cancelled, until complete".  That's probably the major reason why people
> think postcopy is dangerous to use..

We could certainly add that restriction, I don't see a problem with
it. That said, what is the actual use case for migrate_cancel? And how
does that compare with yank? I feel like we've been kind of relying on
nobody using those commands really.

One truth that we have (because it's tested) is that the multifd
migration allows migrate_cancel on the source and another migration to
start from it.

(btw, that reminds me that multifd+postcopy will probably break that
test).

>
> Or do we have any use case this could be a valid scenario?

Not that I know of. But you're the postcopy expert =)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]