[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to spe
From: |
Artyom Tarasenko |
Subject: |
Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? |
Date: |
Wed, 19 Aug 2015 16:41:11 +0200 |
On Wed, Aug 19, 2015 at 1:00 PM, Aurelien Jarno <address@hidden> wrote:
> On 2015-08-19 12:41, Artyom Tarasenko wrote:
>> Hi Richard,
>>
>> On Tue, Aug 18, 2015 at 7:55 PM, Richard Henderson <address@hidden> wrote:
>> > On 08/18/2015 02:24 AM, Artyom Tarasenko wrote:
>> >> The unoptimized case is a sequence of multiple cmp and branch
>> >> operations (likely created by a "case" statement in the original
>> >> source code), especially where cmp is in a delay slot of a branch
>> >> instruction.
>> >
>> > Interesting.
>> >
>> >> I wonder whether we always have to finish a TB on a conditional jump.
>> >> Maybe it would make sense to translate further if a destination of a
>> >> jump is not too far from dc->pc? The definition of "not too far" is
>> >> indeed tricky.
>> >
>> > We can only handle two chained exits from a TB. If we continue past
>> > a conditional branch, we may well encounter a second conditional branch,
>> > which
>> > would leave us with three different exits from the TB.
>> >
>> > Something that may be interesting to play with, however, is to change the
>> > TB
>> > with which the insn in a delay slot is connected.
>> >
>> > For instance, we currently spend some amount of effort computing and
>> > saving the
>> > branch condition, so that we can then execute the delay slot, and
>> > afterwards
>> > use the saved branch condition to perform the branch.
>> >
>> > Another way of doing this is to immediately branch, exiting the TB. But
>> > we set
>> > up PC+NPC for the next TB such that the delay slot is the first insn that
>> > is
>> > executed within the next TB. In that way, the compare in the delay slot
>> > that
>> > you mention *is* in the same TB as the branch that uses it, allowing
>> > the case to be optimized.
>> >
>> > This could wind up creating more TBs than the current solution, so it's not
>> > clear that it would be a win. One can mitigate that somewhat by noticing
>> > the
>> > case where the delay slot is a nop. I do think it's worth an experiment.
>>
>> So it is possible to make a TB with non sequential instructions?
>> The instruction in the delay slot would be located most likely
>> elsewhere than the following instructions.
>>
>> But I think I've been chasing a red herring. I see those helpers in
>> perf top when running sysbench, but not when running g++ (and at the
>> end g++ is much more relevant benchmark for me):
>>
>>
>> Samples: 83K of event 'cpu-clock', Event count (approx.): 15333243164,
>> Thread: qemu-system-spa(2743)
>> 27.10% [kernel] [k] retint_signal
>> 12.66% qemu-system-sparc64 [.] tcg_optimize
>> 9.18% [vdso] [.] 0x0000000000000998
>> 8.39% [kernel] [k] _raw_spin_unlock_irqrestore
>> 4.76% qemu-system-sparc64 [.] tcg_liveness_analysis
>> 3.89% qemu-system-sparc64 [.] tcg_reg_alloc_op
>> 2.80% qemu-system-sparc64 [.] tcg_out_opc
>> 2.45% qemu-system-sparc64 [.] get_physical_address_data
>> 1.86% [kernel] [k] native_read_tsc
>> 1.62% qemu-system-sparc64 [.] tlb_flush_page
>> 1.55% qemu-system-sparc64 [.] tcg_out_modrm_sib_offset.constprop.42
>> 1.45% [unknown] [.] 0x00000000451c5cae
>> 1.43% qemu-system-sparc64 [.] gen_intermediate_code_pc
>> 1.39% qemu-system-sparc64 [.] tcg_temp_new_internal_i64
>> 1.24% qemu-system-sparc64 [.] tb_flush_jmp_cache
>> 1.11% qemu-system-sparc64 [.] disas_sparc_insn
>> 1.08% qemu-system-sparc64 [.] tcg_out_modrm
>> 0.97% qemu-system-sparc64 [.] tcg_reg_alloc_start
>> 0.77% qemu-system-sparc64 [.] cpu_sparc_exec
>> 0.73% qemu-system-sparc64 [.] replace_tlb_1bit_lru.isra.3
>> 0.72% qemu-system-sparc64 [.] tcg_gen_code_search_pc
>> 0.72% qemu-system-sparc64 [.] tcg_opt_gen_mov
>> 0.70% qemu-system-sparc64 [.] reset_temp
>>
>> I'm not sure why I still see kernel functions when I zoom into qemu
>> thread. Is this qemu signal handling?
>> And then it would be interesting to know where in this listing is the
>> generated code. Is it [vdso], [unknown] or is it hidden behind
>> retint_signal?
>>
>> Ironically a good optimization target seems to be the tcg_optimize
>> function. If I zoom I see it spends most of the time in
>> reset_all_temps.
>>
>> Any suggestions how to improve it?
>>
>
> Try this patch:
> http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02042.html
Note: I use a different benchmark than Dennis, I compile the binutils
gold linker (which is btw broken on sparc64).
Without the patch:
time g++ -DHAVE_CONFIG_H -I. -I../binutils-gdb/gold
-I../binutils-gdb/gold -I../binutils-gdb/gold/../include
-I../binutils-gdb/gold/../elfcpp
-DLOCALEDIR="\"/usr/local/share/locale\""
-DBINDIR="\"/usr/local/bin\"" -DTOOLBINDIR="\"/usr/local//bin\""
-DTOOLLIBDIR="\"/usr/local//lib\"" -W -Wall -Werror
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -frandom-seed=tilegx.o
-I../binutils-gdb/gold/../zlib -g -O2 -MT tilegx.o -MD -MP -MF
.deps/tilegx.Tpo -c -o tilegx.o ../binutils-gdb/gold/tilegx.cc
real 18m31.407s
user 18m23.661s
sys 0m6.784s
The patch surely improves the situation, tcg_optimize in the perf top
takes ~7% (instead of~12%), and the only function marked red by
perf-top is init_temp_info(). So with the patch:
real 17m46.380s
user 17m37.522s
sys 0m7.120s
And if I completely disable optimizer (// #define
USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster:
real 14m17.668s
user 14m10.241s
sys 0m6.060s
Artyom
--
Regards,
Artyom Tarasenko
SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Mark Cave-Ayland, 2015/08/02
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Dennis Luehring, 2015/08/03
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Artyom Tarasenko, 2015/08/03
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Aurelien Jarno, 2015/08/03
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Artyom Tarasenko, 2015/08/18
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Richard Henderson, 2015/08/18
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Artyom Tarasenko, 2015/08/19
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Aurelien Jarno, 2015/08/19
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?,
Artyom Tarasenko <=
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Richard Henderson, 2015/08/20
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Dennis Luehring, 2015/08/21
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Richard Henderson, 2015/08/21
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Dennis Luehring, 2015/08/21
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Richard Henderson, 2015/08/21
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Aurelien Jarno, 2015/08/21
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Dennis Luehring, 2015/08/21
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Artyom Tarasenko, 2015/08/22
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Dennis Luehring, 2015/08/22
- Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?, Artyom Tarasenko, 2015/08/22