qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads


From: Peter Xu
Subject: Re: [PATCH v3 08/24] migration: Add thread pool of optional load threads
Date: Wed, 4 Dec 2024 17:43:43 -0500

On Thu, Nov 28, 2024 at 01:11:53PM +0100, Maciej S. Szmigiero wrote:
> > > +static int qemu_loadvm_load_thread(void *thread_opaque)
> > > +{
> > > +    struct LoadThreadData *data = thread_opaque;
> > > +    int ret;
> > > +
> > > +    ret = data->function(&load_threads_abort, data->opaque);
> > > +    if (ret && !qatomic_read(&load_threads_ret)) {
> > > +        /*
> > > +         * Racy with the above read but that's okay - which thread error
> > > +         * return we report is purely arbitrary anyway.
> > > +         */
> > > +        qatomic_set(&load_threads_ret, ret);
> > > +    }
> > 
> > Can we use cmpxchg instead? E.g.:
> > 
> > if (ret) {
> >      qatomic_cmpxchg(&load_threads_ret, 0, ret);
> > }
> 
> cmpxchg always forces sequentially consistent ordering
> while qatomic_read() and qatomic_set() have relaxed ordering.
> 
> As the comment above describes, there's no need for sequential
> consistency since which thread error is returned is arbitrary
> anyway.

IMHO this is not a hot path, so mem ordering isn't an issue.  If we could
avoid any data race we still should try to.

I do feel uneasy on the current design where everybody shares the "whether
to quit" via one bool, and any thread can set it... meanwhile we can't
stablize the first error to report later.

E.g., ideally we want to capture the first error no matter where it came
from, then keep it with migrate_set_error() so that "query-migrate" on dest
later can tell us what was wrong.  I think libvirt generally uses that.

So as to support a string error, at least we'll need to allow Error** in
the thread fn:

typedef bool (*MigrationLoadThread)(void *opaque, bool *should_quit,
                                    Error **errp);

I also changed retval to bool, as I mentioned elsewhere QEMU tries to stick
with "bool SOME_FUNCTION(..., Error **errp)" kind of error reporting.

Then any thread should only report error to qemu_loadvm_load_thread(), and
the report should always be a local Error**, then it further reports to the
global error.  Something like:

static int qemu_loadvm_load_thread(void *thread_opaque)
{
    MigrationIncomingState *mis = migration_incoming_get_current();
    struct LoadThreadData *data = thread_opaque;
    Error *error = NULL;

    if (!data->function(data->opaque, &mis->should_quit, &error)) {
       migrate_set_error(migrate_get_current(), error);
    }

    return 0;
}

migrate_set_error() is thread-safe, and it'll only record the 1st error.
Then the thread should only read &should_quit, and only set &error.  If we
want, migrate_set_error() can set &should_quit.

PS: I wished we have an unified place to tell whether we should quit
incoming migration - we already have multifd_recv_state->exiting, we could
have had a global flag like that then we can already use.  But I know I'm
asking too much.. However would you think it make sense to still have at
least Error** report the error and record it?

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]