qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 00/32] Migration: postcopy failure recovery


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v4 00/32] Migration: postcopy failure recovery
Date: Fri, 1 Dec 2017 18:23:51 +0800
User-agent: Mutt/1.9.1 (2017-09-22)

On Thu, Nov 30, 2017 at 08:00:54PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (address@hidden) wrote:
> > Tree is pushed here for better reference and testing:
> >   github.com/xzpeter postcopy-recovery-support
> 
> Hi Peter,
>   Do you have a git with this code + your OOB world in?
> I'd like to play with doing recovery and see what happens;
> I still worry a bit about whether the (potentially hung) main loop
> is needed for the new incoming connection to be accepted by the
> destination.

Good question...

I'd say I thought it was okay.  The reason is that as long as we run
migrate-incoming command using run-oob=true, it'll be run in iothread,
and our iothread implementation has this in iothread_run():

    g_main_context_push_thread_default(iothread->worker_context);

This _should_ mean that from now on NULL context will be replaced with
iothread->worker_context (which is the monitor context, rather than
main thread any more) mostly (I say mostly because there are corner
cases that glib won't use this thread-local var but still the global
one, though it should not be our case I guess).

I tried to confirm this by breaking at the entry of function
socket_accept_incoming_migration() on destination side.  Sadly, I was
wrong.  It's still running in main().

I found that the problem is that g_source_attach() implementation is
still using the g_main_context_default() rather than
g_main_context_get_thread_default() for the cases where context=NULL
is passed in.  I don't know whether this is a glib bug:

g_source_attach (GSource      *source,
                 GMainContext *context)
{
  guint result = 0;
  ...
  if (!context)
    context = g_main_context_default ();
  ...
}

I'm CCing some more people who may know better on glib than me.

For now, I think a simple solution can be that, we just call
g_main_context_get_thread_default() explicitly for QIO code.  But also
I'd like to see how other people think too.

I'll prepare one branch soon, including the two series (postcopy
recovery + oob), after the solution is settled down.  Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]