qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v3 01/27] char-io: fix possible race on IOWatchPol


From: Peter Xu
Subject: Re: [Qemu-devel] [RFC v3 01/27] char-io: fix possible race on IOWatchPoll
Date: Wed, 15 Nov 2017 17:48:10 +0800
User-agent: Mutt/1.9.1 (2017-09-22)

On Wed, Nov 15, 2017 at 09:37:40AM +0000, Stefan Hajnoczi wrote:
> On Tue, Nov 14, 2017 at 07:31:10PM +0800, Peter Xu wrote:
> > On Tue, Nov 14, 2017 at 10:32:19AM +0000, Stefan Hajnoczi wrote:
> > > On Tue, Nov 14, 2017 at 02:09:39PM +0800, Peter Xu wrote:
> > > > On Mon, Nov 13, 2017 at 04:52:11PM +0000, Stefan Hajnoczi wrote:
> > > > > On Mon, Nov 06, 2017 at 05:46:17PM +0800, Peter Xu wrote:
> > > > > > This is not a problem if we are only having one single loop thread 
> > > > > > like
> > > > > > before.  However, after per-monitor thread is introduced, this is 
> > > > > > not
> > > > > > true any more, and the race can happen.
> > > > > > 
> > > > > > The race can be triggered with "make check -j8" sometimes:
> > > > > 
> > > > > Please mention a specific test case that fails.
> > > > 
> > > > It was any of the check-qtest-$(TARGET)s that failed.  I'll mention
> > > > that in next post.
> > > > 
> > > > > 
> > > > > > 
> > > > > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:
> > > > > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.
> > > > > > 
> > > > > > This patch keeps the reference for the watch object when creating in
> > > > > > io_add_watch_poll(), so that the object will never be released in 
> > > > > > the
> > > > > > context main loop, especially when the context loop is running in
> > > > > > another standalone thread.  Meanwhile, when we want to remove the 
> > > > > > watch
> > > > > > object, we always first detach the watch object from its owner 
> > > > > > context,
> > > > > > then we continue with the cleanup.
> > > > > > 
> > > > > > Without this patch, calling io_remove_watch_poll() in main loop 
> > > > > > thread
> > > > > > is not thread-safe, since the other per-monitor thread may be 
> > > > > > modifying
> > > > > > the watch object at the same time.
> > > > > > 
> > > > > > Reviewed-by: Marc-André Lureau <address@hidden>
> > > > > > Signed-off-by: Peter Xu <address@hidden>
> > > > > > ---
> > > > > >  chardev/char-io.c | 16 ++++++++++++++--
> > > > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/chardev/char-io.c b/chardev/char-io.c
> > > > > > index f81052481a..50b5bac704 100644
> > > > > > --- a/chardev/char-io.c
> > > > > > +++ b/chardev/char-io.c
> > > > > > @@ -122,7 +122,6 @@ GSource *io_add_watch_poll(Chardev *chr,
> > > > > >      g_free(name);
> > > > > >  
> > > > > >      g_source_attach(&iwp->parent, context);
> > > > > > -    g_source_unref(&iwp->parent);
> > > > > >      return (GSource *)iwp;
> > > > > >  }
> > > > > >  
> > > > > > @@ -131,12 +130,25 @@ static void io_remove_watch_poll(GSource 
> > > > > > *source)
> > > > > >      IOWatchPoll *iwp;
> > > > > >  
> > > > > >      iwp = io_watch_poll_from_source(source);
> > > > > > +
> > > > > > +    /*
> > > > > > +     * Here the order of destruction really matters.  We need to 
> > > > > > first
> > > > > > +     * detach the IOWatchPoll object from the context (which may 
> > > > > > still
> > > > > > +     * be running in another loop thread), only after that could we
> > > > > > +     * continue to operate on iwp->src, or there may be race 
> > > > > > condition
> > > > > > +     * between current thread and the context loop thread.
> > > > > > +     *
> > > > > > +     * Let's blame the glib bug mentioned in commit 2b316774f6
> > > > > > +     * ("qemu-char: do not operate on sources from finalize
> > > > > > +     * callbacks") for this extra complexity.
> > > > > 
> > > > > I don't understand how this bug is to blame.  Isn't the problem here a
> > > > > race condition between two QEMU threads?
> > > > 
> > > > Yes, it is.
> > > > 
> > > > The problem is, we won't have the race condition if glib does not have
> > > > that bug mentioned.  Then the thread running GMainContext will have
> > > > full control of iwp->src destruction, and destruction of it would be
> > > > fairly straightforward (unref iwp->src in IOWatchPoll destructor).
> > > > Now IIUC we are doing this in a hacky way, say, we destroy iwp->src
> > > > explicitly from main thread before quitting (see [1] below, the whole
> > > > if clause).
> > > > 
> > > > > 
> > > > > Why are two threads accessing the watch at the same time?
> > > > 
> > > > Here is how I understand:
> > > > 
> > > > Firstly we need to tackle with that bug, by an explicit destruction of
> > > > iwp->src below; meanwhile when we are destroying it, the GMainContext
> > > > can still be running somewhere (it's not happening in current series
> > > > since I stopped iothread earlier than this point, however it can still
> > > > happen if in the future we don't do that), then we possibly want this
> > > > patch.
> > > > 
> > > > Again, without this patch, current series should work; however I do
> > > > hope this patch can be in, in case someday we want to provide complete
> > > > thread safety for Chardevs (now it is not really thread-safe).
> > > 
> > > You said qtests fail with "Assertion `iwp->src == NULL' failed" but then
> > > you said "without this patch, current series should work".  How do you
> > > reproduce the failure if it doesn't occur?
> > 
> > Actually it occurs in some old versions, but not in current version.
> > Current version destroys the iothread earlier (as Dan suggested), so
> > it can avoid the issue.  Sorry for not being clear.
> > 
> > > 
> > > It looks like remove_fd_in_watch() -> io_remove_watch_poll() callers
> > > fall into two categories: called from within the event loop and called
> > > when a chardev is destroyed.  Do the thread-safety issues occur when the
> > > chardev is destroyed by the QEMU main loop thread?  Or did I miss cases
> > > where remove_fd_in_watch() is called from other threads?
> > 
> > I think this can also be called in monitor iothread?
> 
> When I say "event loop", I mean any thread that is running an event loop
> including IOThreads and the main loop thread.
> 
> What do you mean by "monitor iothread"?

Ah, I see.  Yes, then I think it's true - the failure only happens
when remove_fd_in_watch() is called during destruction in main loop
thread.

> 
> > Even if so, it's
> > pretty safe since if the monitor iothread is calling
> > remove_fd_in_watch() then it must not be using it after all.  The race
> > can happen when we are destroying the IOWatchPoll while the other
> > event loop thread (which may not be the main thread) is still running,
> > just like what I did in my old series.
> 
> The scenario this patch is trying to address doesn't make a lot of sense
> since there will be further thread-safety problems if two threads are
> modifying a Chardev at the same time.  A lock will probably be required
> to protect the state and this patch might not be necessary then.
> 
> This patch seems very speculative and it's unclear what concrete
> scenario it addresses.  I suggest dropping the patch from this series so
> it is not a distraction from what you're actually trying to achieve.

Ok, then let me drop it.  Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]