[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH experiment 00/16] C++20 coroutine backend
From: |
Paolo Bonzini |
Subject: |
[PATCH experiment 00/16] C++20 coroutine backend |
Date: |
Mon, 14 Mar 2022 10:31:47 +0100 |
It turns out that going from a prototype C++ implementation of the QEMU
API, to something that could build tests/unit/test-coroutine, was just a
few hours work; and once it compiled, only one line had to be changed
for every test to pass.
Most of the differences between C and C++ already show up here:
- keywords such as "new" (or "class", which I didn't encounter yet)
- _Generic must be replaced by templates and/or overloading (QemuCoLockable
is implemented completely different from QemuLockable, in fact I spent
most of the time on that)
- PRI* functions must be separated with a space from string constants that
precede it
- void* casts must be explicit (g_new takes care of that most of the time,
but not for opaque pointers passed to coroutine).
There are 300 lines of hard-core C++ in the backend and in
coroutine.h. I tried to comment it as much as possible (this
time I didn't include a big commit message on stackless coroutines
in general) but it still requires some knowledge of the basic C++
coroutine concepts of resumable types, promise types and awaiter types.
https://www.youtube.com/watch?v=ZTqHjjm86Bw is an excellent introduction
and it's where I learnt most of what was needed.
However, there are no ramifications to actual coroutine code, except
for the template syntax "CoroutineFn<return_type>" for the function and
the mandatory co_await/co_return keywords... both of which are an
improvement, really: the fact that a single function cannot run either
inside or outside coroutines is checked by the compiler now, because
qemu_coroutine_create accepts a function that returns CoroutineFn<void>.
Therefore I had to disable some more code in util/ and qapi/ that used
qemu_in_coroutine() or coroutine_fn.
Here is the performance comparison of the three backends:
ucontext stackless C stackless C++
/perf/lifecycle 0.068 s 0.025 s 0.065 s
/perf/nesting 55 s 4.7 s 1.7 s
/perf/yield 6.0 s 1.3 s 1.3 s
/perf/cost 8 Mops/s (125ns) 35 ns 10000 Mops/s (99 ns)
One important difference is that C++ coroutines allocate frames on the
heap, and that explains why performance is better in /perf/nesting,
which has to do many large memory allocations for the stack in the other
two backends (and also a makecontext/swapcontext in the ucontext case).
C++ coroutines hardly benefit from the coroutine pool; OTOH that also
means the coroutine pool could be removed if we went this way.
I haven't checked why /perf/lifecycle (and therefore /perf/cost; they
are roughly the same test) is so much slower than the handwritten C code.
It's still comparable with the ucontext backend though.
Overall this was ~twice the amount of work of the C experiment, but
that's because the two are very different ways to achieve the same goal:
- the design work was substantially smaller in the C experiment, where
all the backend does is allocate stack frames and do a loop that invokes
a function pointer. Here the backend has to map between the C++ concepts
and the QEMU API. In the C case, most of the work was really in the
manual conversion which I had to do one function at a time.
- the remaining work is also completely different: a source-to-source
translator (and only build system work in QEMU) for the C experiment;
making ~100 files compile in C++ for this one (and relatively little
work as far as coroutines are concerned).
This was compiled with GCC 11 only. Coroutine support was added in
GCC 10, released in 2020, which IIRC is much newer than the most recent
release we support.
Paolo
Paolo Bonzini (17):
coroutine: add missing coroutine_fn annotations for CoRwlock functions
coroutine: qemu_coroutine_get_aio_context is not a coroutine_fn
coroutine: small code cleanup in qemu_co_rwlock_wrlock
coroutine: introduce QemuCoLockable
port atomic.h to C++
use g_new0 instead of g_malloc0
start porting compiler.h to C++
tracetool: add extern "C" around generated headers
start adding extern "C" markers
add space between liter and string macro
bump to C++20
remove "new" keyword from trace-events
disable some code
util: introduce C++ stackless coroutine backend
port QemuCoLockable to C++ coroutines
port test-coroutine to C++ coroutines
configure | 48 +-
include/block/aio.h | 5 +
include/fpu/softfloat-types.h | 4 +
include/qemu/atomic.h | 5 +
include/qemu/bitops.h | 3 +
include/qemu/bswap.h | 10 +-
include/qemu/co-lockable.h | 93 ++++
include/qemu/compiler.h | 4 +
include/qemu/coroutine.h | 466 +++++++++++++-----
include/qemu/coroutine_int.h | 8 +
include/qemu/host-utils.h | 4 +
include/qemu/lockable.h | 13 +-
include/qemu/notify.h | 4 +
include/qemu/osdep.h | 1 +
include/qemu/qsp.h | 4 +
include/qemu/thread.h | 4 +
include/qemu/timer.h | 6 +-
include/qemu/typedefs.h | 1 +
meson.build | 2 +-
qapi/qmp-dispatch.c | 2 +
scripts/tracetool/format/h.py | 8 +-
tests/unit/meson.build | 8 +-
.../{test-coroutine.c => test-coroutine.cc} | 138 +++---
util/async.c | 2 +
util/coroutine-stackless.cc | 145 ++++++
util/meson.build | 14 +-
...oroutine-lock.c => qemu-coroutine-lock.cc} | 78 +--
...outine-sleep.c => qemu-coroutine-sleep.cc} | 10 +-
util/{qemu-coroutine.c => qemu-coroutine.cc} | 18 +-
util/thread-pool.c | 2 +
util/trace-events | 40 +-
31 files changed, 805 insertions(+), 345 deletions(-)
create mode 100644 include/qemu/co-lockable.h
rename tests/unit/{test-coroutine.c => test-coroutine.cc} (81%)
create mode 100644 util/coroutine-stackless.cc
rename util/{qemu-coroutine-lock.c => qemu-coroutine-lock.cc} (86%)
rename util/{qemu-coroutine-sleep.c => qemu-coroutine-sleep.cc} (89%)
rename util/{qemu-coroutine.c => qemu-coroutine.cc} (93%)
--
2.35.1
- [PATCH experiment 00/16] C++20 coroutine backend,
Paolo Bonzini <=
- [PATCH experiment 01/16] coroutine: add missing coroutine_fn annotations for CoRwlock functions, Paolo Bonzini, 2022/03/14
- [PATCH experiment 06/16] use g_new0 instead of g_malloc0, Paolo Bonzini, 2022/03/14
- [PATCH experiment 02/16] coroutine: qemu_coroutine_get_aio_context is not a coroutine_fn, Paolo Bonzini, 2022/03/14
- [PATCH experiment 08/16] tracetool: add extern "C" around generated headers, Paolo Bonzini, 2022/03/14
- [PATCH experiment 03/16] coroutine: small code cleanup in qemu_co_rwlock_wrlock, Paolo Bonzini, 2022/03/14
- [PATCH experiment 05/16] port atomic.h to C++, Paolo Bonzini, 2022/03/14