[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thread-safe |
Date: |
Tue, 26 Apr 2016 11:54:19 +0100 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Fri, Apr 15, 2016 at 01:31:55PM +0200, Paolo Bonzini wrote:
> [this time including the mailing list]
>
> This is yet another tiny bit of the multiqueue work, this time affecting
> the synchronization infrastructure for coroutines. Currently, coroutines
> synchronize between the main I/O thread and the dataplane iothread through
> the AioContext lock. However, for multiqueue a single BDS will be used
> by multiple iothreads and hence multiple AioContexts. This calls for
> a different approach to coroutine synchronization. This series is my
> first attempt at it. Besides multiqueue, I think throttling groups
> (which can already span multiple AioContexts) could also take advantage
> of the new CoMutexes.
>
> The series has two main parts:
>
> - it makes CoMutex bind coroutines to an AioContexts. It is of course
> still possible to move coroutines around with explicit yield and
> qemu_coroutine_enter, but a CoMutex will now reenter the coroutine
> where it was running before, rather than in the AioContext of
> the previous owner. To do this, a new function aio_co_schedule is
> introduced to run a coroutine on a given iothread. I think this could
> be used in other places too; for now it lets a CoMutex protect stuff
> across multiple AioContexts without them moving around(*). Of course
> currently a CoMutex is generally not used across multiple iothreads,
> because you have to acquire/release the AioContext around CoMutex
> critical sections. However...
>
> - ... the second change is exactly to make CoMutex thread-safe and remove
> the need for a "thread-based" mutex around it. The new CoMutex is
> exactly the same as a mutex implementation that you'd find in an
> operating system. iothreads are the moral equivalent of CPUs in
> a kernel, while coroutines resemble kernel threads running without
> preemption on a CPU. Even if you have to take concurrency into account,
> the lack of preemption while running coroutines or bottom halves
> keeps the complexity at bay. For example, it is easy to synchronize
> between qemu_co_mutex_lock's yield and the qemu_coroutine_enter in
> aio_co_schedule's bottom half.
>
> Same as before, CoMutex puts coroutines to sleep with
> qemu_coroutine_yield and wake them up with the new aio_co_schedule.
> I could have wrapped CoMutex's CoQueue with a "regular" thread mutex or
> spinlock. The resulting code would have looked a lot like RFifoLock
> (with CoQueue replacing RFifoLock's condition variable). Rather,
> inspired by the parallel between coroutines and non-preemptive kernel
> threads, I chose to adopt the same lock-free mutex algorithm as OSv.
> The algorithm only needs two to four atomic ops for a lock-unlock pair
> (two when uncontended). To cover CoQueue, each CoQueue is made to
> depend on a CoMutex, similar to condition variables. Most CoQueues
> already have a corresponding CoMutex so this is not a big deal;
> converting the others is left for a future series. I did this because
> CoQueue looks a lot like OSv's waitqueues; so if necessary, we could
> even take OSv's support for wait morphing (which avoids the thundering
> herd problem) and add it to CoMutex and CoQueue. This may be useful
> when making tracked_requests thread-safe.
>
> Kevin: this has nothing to do with my old plan from Brno, and it's
> possibly a lot closer to what you wanted. Your idea was to automatically
> release the "thread mutex" when a coroutine yields, I think you should
> be fine with not having a thread mutex at all!
>
> This will need some changes in the formats because, for multiqueue,
> CoMutexes would need to be used like "real" thread mutexes. Code like
> this:
>
> ...
> qemu_co_mutex_unlock()
> ... /* still access shared data, but don't yield */
> qemu_coroutine_yield()
>
> might be required to use this other pattern:
>
> ... /* access shared data, but don't yield */
> qemu_co_mutex_unlock()
> qemu_coroutine_yield()
>
> because "adding a second CPU" is already introducing concurrency that
> wasn't there before. The "non-preemptive multitasking" reference only
> applies to things that access AioContext-local data. This includes the
> synchronization primitives implemented in this series, the thread pool,
> the Linux AIO support, but not much else. It still simplifies _those_
> though. :)
>
> Anyhow, we'll always have some BlockDriver that do not support multiqueue,
> such as the network protocols. Thus it would be possible to handle the
> formats one at a time. raw-posix, raw and qcow2 would already form a
> pretty good set, and the first two do not use CoMutex at all.
>
> The patch has quite a lot of new code, but about half of it is testcases.
> The new test covers correctness and (when run with --verbose) also takes a
> stab at measuring performance; the results is that performance of CoMutex
> is comparable to pthread mutexes. The only snag is that that you cannot
> make a direct comparison between CoMutex (fair) and pthread_mutex_t
> (unfair). For this reason the testcase also measures performance of a
> quick-and-dirty implementation of a fair mutex, based on MCS locks +
> futexes.
>
> There's a lot of meat in the above text, and I hope it will make the code
> clearer and compensate for the terse commit messages. :) I'll probably
> write a single-threaded testcase too, just to provide some more unit
> test comparison of "before" and "after".
>
> I haven't even started a guest with this patches, let alone run
> qemu-iotests... generally the changes are well confined to unit tested
> code, patch 2 for example is completely untested. There are a couple
> other places that at least need more comments, but I wanted to throw
> the patch out for an early review of the general approach.
>
> Paolo Bonzini (11):
> coroutine: use QSIMPLEQ instead of QTAILQ
> throttle-groups: restart throttled requests from coroutine context
> coroutine: delete qemu_co_enter_next
> aio: introduce aio_co_schedule
> coroutine-lock: reschedule coroutine on the AioContext it was running on
> coroutine-lock: make CoMutex thread-safe
> coroutine-lock: add limited spinning to CoMutex
> test-aio-multithread: add performance comparison with thread-based mutexes
> coroutine-lock: place CoMutex before CoQueue in header
> coroutine-lock: add mutex argument to CoQueue APIs
> coroutine-lock: make CoRwlock thread-safe and fair
>
> async.c | 38 ++++
> block/backup.c | 2 +-
> block/io.c | 2 +-
> block/qcow2-cluster.c | 4 +-
> block/sheepdog.c | 2 +-
> block/throttle-groups.c | 71 +++++--
> hw/9pfs/9p.c | 2 +-
> include/block/aio.h | 14 ++
> include/qemu/atomic.h | 3 +
> include/qemu/coroutine.h | 106 ++++++----
> include/qemu/coroutine_int.h | 14 +-
> iothread.c | 16 ++
> stubs/iothread.c | 11 ++
> tests/Makefile | 11 +-
> tests/iothread.c | 107 ++++++++++
> tests/iothread.h | 25 +++
> tests/test-aio-multithread.c | 452
> +++++++++++++++++++++++++++++++++++++++++++
> trace-events | 8 +-
> util/qemu-coroutine-lock.c | 257 ++++++++++++++++++++----
> util/qemu-coroutine.c | 2 +-
> 20 files changed, 1037 insertions(+), 110 deletions(-)
> create mode 100644 tests/iothread.c
> create mode 100644 tests/iothread.h
> create mode 100644 tests/test-aio-multithread.c
Looks promising, modulo questions and comments I posted on patches.
I wonder whether it would be cleaner to introduce new primitives instead
of modifying CoMutex/CoQueue. That way it would be clear whether code
is written to be thread-safe or not.
signature.asc
Description: PGP signature
- [Qemu-devel] [PATCH 06/11] coroutine-lock: make CoMutex thread-safe, (continued)
- [Qemu-devel] [PATCH 06/11] coroutine-lock: make CoMutex thread-safe, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 02/11] throttle-groups: restart throttled requests from coroutine context, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 04/11] aio: introduce aio_co_schedule, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 07/11] coroutine-lock: add limited spinning to CoMutex, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 10/11] coroutine-lock: add mutex argument to CoQueue APIs, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 08/11] test-aio-multithread: add performance comparison with thread-based mutexes, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 09/11] coroutine-lock: place CoMutex before CoQueue in header, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 11/11] coroutine-lock: make CoRwlock thread-safe and fair, Paolo Bonzini, 2016/04/15
- Re: [Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thread-safe,
Stefan Hajnoczi <=