[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 00/32] Migration: postcopy failure recovery
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH v4 00/32] Migration: postcopy failure recovery |
Date: |
Fri, 1 Dec 2017 18:23:51 +0800 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
On Thu, Nov 30, 2017 at 08:00:54PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (address@hidden) wrote:
> > Tree is pushed here for better reference and testing:
> > github.com/xzpeter postcopy-recovery-support
>
> Hi Peter,
> Do you have a git with this code + your OOB world in?
> I'd like to play with doing recovery and see what happens;
> I still worry a bit about whether the (potentially hung) main loop
> is needed for the new incoming connection to be accepted by the
> destination.
Good question...
I'd say I thought it was okay. The reason is that as long as we run
migrate-incoming command using run-oob=true, it'll be run in iothread,
and our iothread implementation has this in iothread_run():
g_main_context_push_thread_default(iothread->worker_context);
This _should_ mean that from now on NULL context will be replaced with
iothread->worker_context (which is the monitor context, rather than
main thread any more) mostly (I say mostly because there are corner
cases that glib won't use this thread-local var but still the global
one, though it should not be our case I guess).
I tried to confirm this by breaking at the entry of function
socket_accept_incoming_migration() on destination side. Sadly, I was
wrong. It's still running in main().
I found that the problem is that g_source_attach() implementation is
still using the g_main_context_default() rather than
g_main_context_get_thread_default() for the cases where context=NULL
is passed in. I don't know whether this is a glib bug:
g_source_attach (GSource *source,
GMainContext *context)
{
guint result = 0;
...
if (!context)
context = g_main_context_default ();
...
}
I'm CCing some more people who may know better on glib than me.
For now, I think a simple solution can be that, we just call
g_main_context_get_thread_default() explicitly for QIO code. But also
I'd like to see how other people think too.
I'll prepare one branch soon, including the two series (postcopy
recovery + oob), after the solution is settled down. Thanks,
--
Peter Xu