Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads

From:	Maciej S. Szmigiero
Subject:	Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads
Date:	Tue, 17 Dec 2024 00:15:36 +0100
User-agent:	Mozilla Thunderbird

On 16.12.2024 17:29, Peter Xu wrote:

On Thu, Dec 12, 2024 at 11:53:24PM +0100, Maciej S. Szmigiero wrote:

On 12.12.2024 17:38, Peter Xu wrote:

On Wed, Dec 11, 2024 at 12:05:23AM +0100, Maciej S. Szmigiero wrote:

Maybe move it over to migration_object_init()?  Then we keep
qemu_loadvm_state_setup() only invoke the load_setup()s.


AFAIK migration_object_init() is called unconditionally
at QEMU startup even if there won't me any migration done?

Creating a load thread pool there seems wasteful if no
incoming migration will ever take place (or will but only
much later).


I was expecting an empty pool to not be a major resource, but if that's a
concern, yes we can do that until later.

[...]

@@ -3007,6 +3071,19 @@ int qemu_loadvm_state(QEMUFile *f)
             return ret;
         }
+    if (ret == 0) {
+        bql_unlock(); /* Let load threads do work requiring BQL */
+        thread_pool_wait(load_threads);
+        bql_lock();
+
+        ret = load_threads_ret;
+    }
+    /*
+     * Set this flag unconditionally so we'll catch further attempts to
+     * start additional threads via an appropriate assert()
+     */
+    qatomic_set(&load_threads_abort, true);


I assume this is only for debugging purpose and not required.

Setting "abort all threads" to make sure "nobody will add more thread
tasks" is pretty awkward, IMHO.  If we really want to protect against it
and fail hard, it might be easier after the thread_pool_wait() we free the
pool directly (destroy() will see NULL so it'll skip; still need to free
there in case migration failed before this).  Then any enqueue will access
null pointer on the pool.


We don't want to destroy the thread pool in the path where the downtime
is still counting.


Yeah this makes sense.


That's why we only do cleanup after the migration is complete.

The above setting of load_threads_abort flag also makes sure that we abort
load threads if the migration is going to fail for other reasons (non-load
threads related) - in other words, when the above block with thread_pool_wait()
isn't even entered due to ret already containing an earlier error.


In that case IIUC we should cleanup the load threads in destroy(), not
here?  Especially with the comment that's even more confusing.


This flag only asks the threads in pool which are still running to exit ASAP
(without waiting for them in the "fail for other reasons"
qemu_loadvm_state() code flow).


I thought we could switch to an Error** model as we talked elsewhere, then
the thread who hits the error should set the quit flag, IIUC.

Even without it..


Setting this flag does *not* do the cleanup of the whole thread pool - this
only happens in qemu_loadvm_state_cleanup().


... we have two cases here:

Either no error at all, then thread_pool_wait() will wait for all threads
until finished.  When reaching here setting this flag shouldn't matter for
the threads because they're all finished.

Or there's error in some thread, then QEMU should be stuck at
thread_pool_wait() anyway, until all threads quit.  Again, I thought it
could be the qemu_loadvm_load_thread() that sets the quit flag (rather than
here) so the failed thread will notify all threads to quit.

I just still don't see what's the help of setting it after
thread_pool_wait(), which already marked all threads finished at its
return.  That goes back to my question on whether it was only for debugging
(so no new threads to be created after this), rather than the flag to tell
all threads to quit.


There's also a possibility of earlier error in qemu_loadvm_state()
(not in the load threads themselves), for example if qemu_loadvm_state_main()
returns an error.

In this case thread_pool_wait() *won't* be called but the load threads
would still be running needlessly - setting load_threads_abort flag makes
them stop.

The debugging benefit of assert()ing when someone tries to create
a load thread after that point comes essentially for free then.

Thanks,


Thanks,
Maciej

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, (continued)
- Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Peter Xu, 2024/12/04
  - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Peter Xu, 2024/12/05
    - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Maciej S. Szmigiero, 2024/12/10
  - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Maciej S. Szmigiero, 2024/12/10
    - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Peter Xu, 2024/12/12
    - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Maciej S. Szmigiero, 2024/12/12
    - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Peter Xu, 2024/12/16
    - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Maciej S. Szmigiero <=
    - Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads, Peter Xu, 2024/12/17

Prev by Date: Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads
Next by Date: Re: [PULL 0/7] Firmware 20241216 patches
Previous by thread: Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads
Next by thread: Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads
Index(es):
- Date
- Thread