qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on IOWatchPoll


From: Peter Xu
Subject: Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on IOWatchPoll
Date: Wed, 20 Sep 2017 19:18:49 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Wed, Sep 20, 2017 at 12:03:09PM +0100, Daniel P. Berrange wrote:
> On Wed, Sep 20, 2017 at 06:49:58PM +0800, Peter Xu wrote:
> > On Wed, Sep 20, 2017 at 10:14:38AM +0100, Daniel P. Berrange wrote:
> > > On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:
> > > > On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:
> > > > > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:
> > > > > > This is not a problem if we are only having one single loop thread 
> > > > > > like
> > > > > > before.  However, after per-monitor thread is introduced, this is 
> > > > > > not
> > > > > > true any more, and the race can happen.
> > > > > > 
> > > > > > The race can be triggered with "make check -j8" sometimes:
> > > > > > 
> > > > > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:
> > > > > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.
> > > > > > 
> > > > > > This patch keeps the reference for the watch object when creating in
> > > > > > io_add_watch_poll(), so that the object will never be released in 
> > > > > > the
> > > > > > context main loop, especially when the context loop is running in
> > > > > > another standalone thread.  Meanwhile, when we want to remove the 
> > > > > > watch
> > > > > > object, we always first detach the watch object from its owner 
> > > > > > context,
> > > > > > then we continue with the cleanup.
> > > > > > 
> > > > > > Without this patch, calling io_remove_watch_poll() in main loop 
> > > > > > thread
> > > > > > is not thread-safe, since the other per-monitor thread may be 
> > > > > > modifying
> > > > > > the watch object at the same time.
> > > > > 
> > > > > This doesn't feel right to me. Why is the main loop thread doing 
> > > > > anything
> > > > > at all with the Chardev, if there is a per-monitor thread ? The 
> > > > > Chardev
> > > > > code isn't thread safe so it isn't safe to have two separate threads
> > > > > accessing the same Chardev. IOW, if we want a per-monitor thread, then
> > > > > we must make sure the main thread never touches that monitor's chardev
> > > > > at all.  While your patch here might have avoided the assertion you
> > > > > mention above, I fear this is just papering over a fundamental problem
> > > > > that still exists, that can only be solved by not letting the mainloop
> > > > > touch the chardev at all.
> > > > 
> > > > The stack I encountered:
> > > > 
> > > > #0  0x00007f658234c765 in __GI_raise (address@hidden) at 
> > > > ../sysdeps/unix/sysv/linux/raise.c:54
> > > > #1  0x00007f658234e36a in __GI_abort () at abort.c:89
> > > > #2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, 
> > > > address@hidden "iwp->src == NULL", address@hidden 
> > > > "/root/git/qemu/chardev/char-io.c", address@hidden, address@hidden 
> > > > <__PRETTY_FUNCTION__.21863> "io_watch_poll_finalize") at assert.c:92
> > > > #3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 
> > > > "iwp->src == NULL", file=0x55c76345fcc0 
> > > > "/root/git/qemu/chardev/char-io.c", line=91, function=0x55c76345fd10 
> > > > <__PRETTY_FUNCTION__.21863> "io_watch_poll_finalize") at assert.c:101
> > > > #4  0x000055c7632c2be5 in io_watch_poll_finalize 
> > > > (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91
> > > > #5  0x00007f65847bb859 in g_source_unref_internal () at 
> > > > /lib64/libglib-2.0.so.0
> > > > #6  0x00007f65847bca29 in g_source_destroy_internal () at 
> > > > /lib64/libglib-2.0.so.0
> > > > #7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) 
> > > > at /root/git/qemu/chardev/char-io.c:139
> > > > #8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at 
> > > > /root/git/qemu/chardev/char-io.c:145
> > > > #9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, 
> > > > fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, 
> > > > context=0x0, set_open=true)
> > > >     at /root/git/qemu/chardev/char-fe.c:267
> > > > #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, 
> > > > del=false) at /root/git/qemu/chardev/char-fe.c:231
> > > > #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at 
> > > > /root/git/qemu/monitor.c:600
> > > > #12 0x000055c762e340ec in monitor_cleanup () at 
> > > > /root/git/qemu/monitor.c:4346
> > > > #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, 
> > > > envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889
> > > > 
> > > > So it's destroying the CharBackend, but it'll then call
> > > > qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.
> > > 
> > > Ok that code is broken - it must not call monitor_cleanup from the main
> > > thread - it needs to be called from the monitor thread, unless it can
> > > guarantee that the monitor thread has already exited, which seems unlikely
> > 
> > The problem is that not all monitors are parsed in the IO thread, but
> > only those with use_io_thr=true set.
> > 
> > How about I move the calls of monitor_data_destroy() into that monitor
> > IO thread when use_io_thr=true?  And for the rest, I think they still
> > need to be destroyed in the main thread.
> 
> I think having the monitor sometimes run in the main thread and sometimes
> run in a background thread is a recipe for ongoing trouble, of which this
> problem is just the first example that will hurt us. People will test
> behaviour of a feature with one setup and then users will later run it in
> a different setup and potentially experiance obscure bugs as a result.
> IOW, use_io_thr flag should not exist, and every monitor should be run
> unconditionally in the background thread from the point at which your
> patch series merges.

I agree with you that this may bring trouble in some aspect.  I just
don't know whether it'll bring more trouble if we move all the
monitor-related chardev IO into monitor thread.

The key is the muxed typed chardev.

If we don't have muxed typed chardev, I'll surely consider to use IO
thread for all the monitors.

However, the muxed chardevs can support e.g. one monitor plus a serial
port. Can we just run the IO stuff in monitor thread even part of its
frontend is a serial port?  And also I'm not sure what would happen if
it's a monitor plus something else I even don't aware of.

Any nicer thoughts?  Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]