[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG. |
Date: |
Mon, 10 Aug 2015 19:34:22 +0100 |
address@hidden writes:
> From: KONRAD Frederic <address@hidden>
>
> This is the 7th round of the MTTCG patch series.
>
>
> It can be cloned from:
> address@hidden:fkonrad/mttcg.git branch multi_tcg_v7.
I'm not seeing this yet, did you remember to push?
>
> This patch-set try to address the different issues in the global picture of
> MTTCG, presented on the wiki.
>
> == Needed patch for our work ==
>
> Some preliminaries are needed for our work:
> * current_cpu doesn't make sense in mttcg so a tcg_executing flag is added to
> the CPUState.
> * We need to run some work safely when all VCPUs are outside their execution
> loop. This is done with the async_run_safe_work_on_cpu function introduced
> in this series.
> * QemuSpin lock is introduced (on posix only yet) to allow a faster handling
> of
> atomic instruction.
>
> == Code generation and cache ==
>
> As Qemu stands, there is no protection at all against two threads attempting
> to
> generate code at the same time or modifying a TranslationBlock.
> The "protect TBContext with tb_lock" patch address the issue of code
> generation
> and makes all the tb_* function thread safe (except tb_flush).
> This raised the question of one or multiple caches. We choosed to use one
> unified cache because it's easier as a first step and since the structure of
> QEMU effectively has a ‘local’ cache per CPU in the form of the jump cache, we
> don't see the benefit of having two pools of tbs.
>
> == Dirty tracking ==
>
> Protecting the IOs:
> To allows all VCPUs threads to run at the same time we need to drop the
> global_mutex as soon as possible. The io access need to take the mutex. This
> is
> likely to change when http://thread.gmane.org/gmane.comp.emulators.qemu/345258
> will be upstreamed.
>
> Invalidation of TranslationBlocks:
> We can have all VCPUs running during an invalidation. Each VCPU is able to
> clean
> it's jump cache itself as it is in CPUState so that can be handled by a simple
> call to async_run_on_cpu. However tb_invalidate also writes to the
> TranslationBlock which is shared as we have only one pool.
> Hence this part of invalidate requires all VCPUs to exit before it can be
> done.
> Hence the async_run_safe_work_on_cpu is introduced to handle this case.
>
> == Atomic instruction ==
>
> For now only ARM on x64 is supported by using an cmpxchg instruction.
> Specifically the limitation of this approach is that it is harder to support
> 64bit ARM on a host architecture that is multi-core, but only supports 32 bit
> cmpxchg (we believe this could be the case for some PPC cores). For now this
> case is not correctly handled. The existing atomic patch will attempt to
> execute
> the 64 bit cmpxchg functionality in a non thread safe fashion. Our intention
> is
> to provide a new multi-thread ARM atomic patch for 64bit ARM on effective
> 32bit
> hosts.
> This atomic instruction part has been tested with Alexander's atomic stress
> repo
> available here:
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html
>
> The execution is a little slower than upstream probably because of the
> different
> VCPU fight for the mutex. Swaping arm_exclusive_lock from mutex to spin_lock
> reduce considerably the difference.
>
> == Testing ==
>
> A simple double dhrystone test in SMP 2 with vexpress-a15 in a linux guest
> show
> a good performance progression: it takes basically 18s upstream to complete vs
> 10s with MTTCG.
>
> Testing image is available here:
> https://cloud.greensocs.com/index.php/s/CfHSLzDH5pmTkW3
>
> Then simply:
> ./configure --target-list=arm-softmmu
> make -j8
> ./arm-softmmu/qemu-system-arm -M vexpress-a15 -smp 2 -kernel zImage
> -initrd rootfs.ext2 -dtb vexpress-v2p-ca15-tc1.dtb --nographic
> --append "console=ttyAMA0"
>
> login: root
>
> The dhrystone command is the last one in the history.
> "dhrystone 10000000 & dhrystone 10000000"
>
> The atomic spinlock benchmark from Alexander shows that atomic basically work.
> Just follow the instruction here:
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05585.html
>
> == Known issues ==
>
> * GDB stub:
> GDB stub is not tested right now it will probably requires some changes to
> work.
>
> * deadlock on exit:
> When exiting QEMU Ctrl-C some VCPU's thread are not able to exit and
> continue
> execution.
> http://git.greensocs.com/fkonrad/mttcg/issues/1
>
> * memory_region_rom_device_set_romd from pflash01 just crashes the TCG code.
> Strangely this happen only with "-smp 4" and 2 in the DTB.
> http://git.greensocs.com/fkonrad/mttcg/issues/2
>
> Changes V6 -> V7:
> * global_lock:
> * Don't protect softmmu read/write helper as it's now done in
> adress_space_rw.
> * tcg_exec_flag:
> * Make the flag atomically test and set through an API.
> * introduce async_safe_work:
> * move qemu_cpu_kick_thread to avoid prototype declaration.
> * use the work_mutex.
> * async_work:
> * protect it with a mutex (work_mutex) against concurent access.
> * tb_lock:
> * protect tcg_malloc_internal as well.
> * signal the VCPU even of current_cpu is NULL.
> * added PSCI patch.
> * rebased on v2.4.0-rc0 (6169b60285fe1ff730d840a49527e721bfb30899).
>
> Changes V5 -> V6:
> * Introduce async_safe_work to do the tb_flush and some part of
> tb_invalidate.
> * Introduce QemuSpin from Guillaume which allow a faster atomic instruction
> (6s to pass Alexander's atomic test instead of 30s before).
> * Don't take tb_lock before tb_find_fast.
> * Handle tb_flush with async_safe_work.
> * Handle tb_invalidate with async_work and async_safe_work.
> * Drop the tlb_flush_request mechanism and use async_work as well.
> * Fix the wrong lenght in atomic patch.
> * Fix the wrong return address for exception in atomic patch.
>
> Alex Bennée (1):
> target-arm/psci.c: wake up sleeping CPUs (MTTCG)
>
> Guillaume Delbergue (1):
> add support for spin lock on POSIX systems exclusively
>
> KONRAD Frederic (17):
> cpus: protect queued_work_* with work_mutex.
> cpus: add tcg_exec_flag.
> cpus: introduce async_run_safe_work_on_cpu.
> replace spinlock by QemuMutex.
> remove unused spinlock.
> protect TBContext with tb_lock.
> tcg: remove tcg_halt_cond global variable.
> Drop global lock during TCG code execution
> cpu: remove exit_request global.
> tcg: switch on multithread.
> Use atomic cmpxchg to atomically check the exclusive value in a STREX
> add a callback when tb_invalidate is called.
> cpu: introduce tlb_flush*_all.
> arm: use tlb_flush*_all
> translate-all: introduces tb_flush_safe.
> translate-all: (wip) use tb_flush_safe when we can't alloc more tb.
> mttcg: signal the associated cpu anyway.
>
> cpu-exec.c | 98 +++++++++------
> cpus.c | 295
> +++++++++++++++++++++++++-------------------
> cputlb.c | 81 ++++++++++++
> include/exec/exec-all.h | 8 +-
> include/exec/spinlock.h | 49 --------
> include/qemu/thread-posix.h | 4 +
> include/qemu/thread-win32.h | 4 +
> include/qemu/thread.h | 7 ++
> include/qom/cpu.h | 57 +++++++++
> linux-user/main.c | 6 +-
> qom/cpu.c | 20 +++
> target-arm/cpu.c | 21 ++++
> target-arm/cpu.h | 6 +
> target-arm/helper.c | 58 +++------
> target-arm/helper.h | 4 +
> target-arm/op_helper.c | 128 ++++++++++++++++++-
> target-arm/psci.c | 2 +
> target-arm/translate.c | 101 +++------------
> target-i386/mem_helper.c | 16 ++-
> target-i386/misc_helper.c | 27 +++-
> tcg/i386/tcg-target.c | 8 ++
> tcg/tcg.h | 14 ++-
> translate-all.c | 217 +++++++++++++++++++++++++++-----
> util/qemu-thread-posix.c | 45 +++++++
> util/qemu-thread-win32.c | 30 +++++
> vl.c | 6 +
> 26 files changed, 934 insertions(+), 378 deletions(-)
> delete mode 100644 include/exec/spinlock.h
--
Alex Bennée
- Re: [Qemu-devel] [RFC PATCH V7 14/19] cpu: introduce tlb_flush*_all., (continued)
- [Qemu-devel] [RFC PATCH V7 15/19] arm: use tlb_flush*_all, fred . konrad, 2015/08/10
- [Qemu-devel] [RFC PATCH V7 18/19] mttcg: signal the associated cpu anyway., fred . konrad, 2015/08/10
- [Qemu-devel] [RFC PATCH V7 16/19] translate-all: introduces tb_flush_safe., fred . konrad, 2015/08/10
- [Qemu-devel] [RFC PATCH V7 17/19] translate-all: (wip) use tb_flush_safe when we can't alloc more tb., fred . konrad, 2015/08/10
- [Qemu-devel] [RFC PATCH V7 19/19] target-arm/psci.c: wake up sleeping CPUs (MTTCG), fred . konrad, 2015/08/10
- Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.,
Alex Bennée <=
- Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG., Benjamin Herrenschmidt, 2015/08/11
Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG., Paolo Bonzini, 2015/08/11