[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL 22/26] target/aarch64: optimize indirect branches
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PULL 22/26] target/aarch64: optimize indirect branches |
Date: |
Wed, 07 Jun 2017 16:52:15 +0100 |
User-agent: |
mu4e 0.9.19; emacs 25.2.50.3 |
Alex Bennée <address@hidden> writes:
> Alex Bennée <address@hidden> writes:
>
>> Alex Bennée <address@hidden> writes:
>>
>>> Richard Henderson <address@hidden> writes:
>>>
>>>> From: "Emilio G. Cota" <address@hidden>
>>>>
>>>> Measurements:
>>>>
>>>> [Baseline performance is that before applying this and the previous
>>>> commit]
>>>
>>> Sadly this has regressed my qemu-system-aarch64 EL2 run. It was slightly
>>> masked by an unrelated assertion breakage which I had to fix. However
>>> with this patch my boot hangs spinning all 4 threads. Once reverted
>>> things work again.
>>>
>>> My command line:
>>>
>>> timeout -k 1s --foreground 120s ./aarch64-softmmu/qemu-system-aarch64
>>> -machine type=virt -display none -m 4096 -cpu cortex-a57 -serial mon:stdio
>>> -netdev user,id=unet -device virtio-net-device,netdev=unet -drive
>>> file=/home/alex/lsrc/qemu/images/jessie-arm64.qcow2,id=myblock,index=0,if=none
>>> -device virtio-blk-device,drive=myblock -append "console=ttyAMA0
>>> root=/dev/vda1 systemd.unit=benchmark.service" -kernel
>>> /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -smp 4
>>> -machine gic-version=3 -machine virtualization=true -name debug-threads=on
>>>
>>> My tree with fix and revert:
>>>
>>> https://github.com/stsquad/qemu/tree/debug/aarch64-hang
>>>
>>> I'm investigating now.
>>
>> Well this seems to be a case of hangs with -smp > 1 (which I guess was
>> obvious seeing as the TCG threads seem to be spinning against each
>> other).
>
> So the minimum fix I've found to get things working again is:
>
> void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
> {
> CPUState *cpu = ENV_GET_CPU(env);
> TranslationBlock *tb;
> target_ulong cs_base, pc;
> uint32_t flags;
>
> tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> if (likely(tb)) {
> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> if (likely(tb->pc == addr && tb->cs_base == cs_base &&
> tb->flags == flags)) {
> goto found;
> }
> tb = tb_htable_lookup(cpu, addr, cs_base, flags);
> if (likely(tb)) {
> atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb);
> /* goto found; */
> }
> }
> return tcg_ctx.code_gen_epilogue;
> found:
> qemu_log_mask_and_addr(CPU_LOG_EXEC, addr,
> "Chain %p [%d: " TARGET_FMT_lx "] %s\n",
> tb->tc_ptr, cpu->cpu_index, addr,
> lookup_symbol(addr));
> return tb->tc_ptr;
> }
>
> Which I find very odd. It would imply that tb_htable_lookup is giving us
> a bad result, except I would find it very unlikely what ever funky value
> we stored in the jmp cache would never get hit again.
>
> /me is very confused.
OK I think we need to think about when we can run code, c.f. comment in
tb_gen_code:
/* As long as consistency of the TB stuff is provided by tb_lock in user
* mode and is implicit in single-threaded softmmu emulation, no explicit
* memory barrier is required before tb_link_page() makes the TB visible
* through the physical hash table and physical page list.
*/
tb_link_page(tb, phys_pc, phys_page2);
As tb_link_page does the qht_insert I think what is happening is we are
attempting to run code that has just been generated but is not
completely visible to other cores yet. By commenting out the goto found
we slip it enough that next time around it is complete.
However I'm still confused because qht_insert and qht_lookup have
implied barriers in them that should mean we don't pull a partially
complete TB out of the hash.
I did add some debug that saved the tb_htable_lookup's in an cpu indexed
array and they certainly get hits on the tb_jmp_cache lookup so I don't
think my commenting of found means those htable looked up TBs never get
executed.
--
Alex Bennée
- [Qemu-devel] [PULL 20/26] target/hppa: Use tcg_gen_lookup_and_goto_ptr, (continued)
- [Qemu-devel] [PULL 20/26] target/hppa: Use tcg_gen_lookup_and_goto_ptr, Richard Henderson, 2017/06/05
- [Qemu-devel] [PULL 21/26] target/aarch64: optimize cross-page direct jumps in softmmu, Richard Henderson, 2017/06/05
- [Qemu-devel] [PULL 16/26] tcg/arm: Clarify tcg_out_bx for arm4 host, Richard Henderson, 2017/06/05
- [Qemu-devel] [PULL 19/26] target/s390: Use tcg_gen_lookup_and_goto_ptr, Richard Henderson, 2017/06/05
- [Qemu-devel] [PULL 23/26] target/mips: optimize cross-page direct jumps in softmmu, Richard Henderson, 2017/06/05
- [Qemu-devel] [PULL 24/26] target/mips: optimize indirect branches, Richard Henderson, 2017/06/05
- [Qemu-devel] [PULL 22/26] target/aarch64: optimize indirect branches, Richard Henderson, 2017/06/05
- Re: [Qemu-devel] [PULL 22/26] target/aarch64: optimize indirect branches, Richard Henderson, 2017/06/07
- Re: [Qemu-devel] [PULL 22/26] target/aarch64: optimize indirect branches, Alex Bennée, 2017/06/08
[Qemu-devel] [PULL 25/26] target/alpha: Implement WTINT inline, Richard Henderson, 2017/06/05
[Qemu-devel] [PULL 26/26] target/alpha: Use goto_tb for fallthru between TBs, Richard Henderson, 2017/06/05
Re: [Qemu-devel] [PULL 00/26] tcg queued patches, Peter Maydell, 2017/06/06