[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] profiling qemu
From: |
Artyom Tarasenko |
Subject: |
[Qemu-devel] profiling qemu |
Date: |
Tue, 14 Feb 2012 11:33:18 +0100 |
On a x86_64 host a sparc64 emulation feels quite slower than sparc32.
I tried to find out what can be optimized and here are some questions.
First of all, it's not clear how to do it in the current git:
build-prof $ ../qemu/configure --target-list=sparc64-softmmu
--enable-gprof --enable-profiler
[...]
host CPU x86_64
host big endian no
target list sparc64-softmmu
tcg debug enabled no
Mon debug enabled no
gprof enabled yes
sparse enabled no
strip binaries yes
profiler yes
[...]
build-prof $ sparc64-softmmu/qemu-system-sparc64 -nographic -profile
-profile: invalid option
If I launch qemu without -profile option, it starts but,
QEMU 1.0.50 monitor - type 'help' for more information
(qemu) profile
unknown command: 'profile'
(qemu) info profile
async time 38505498320 (38.505)
qemu time 35947093161 (35.947)
Is there a way to find out more?
Next I tried gprof:
build-prof $ gprof sparc64-softmmu/qemu-system-sparc64 gmon.out
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.00 5.06 5.06 main
Hmm. Not very informative. Is there a way to find out more details?
A pre-glib version used to give more information:
$ gprof sparc64-softmmu/qemu-system-sparc64 gmon.out
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
14.78 24.68 24.68 cpu_sparc_exec
7.84 37.76 13.08 compute_all_sub_xcc
7.56 50.38 12.62 compute_all_sub
7.44 62.80 12.42 helper_compute_psr
6.41 73.50 10.70 get_physical_address
5.09 82.00 8.50 compute_all_logic_xcc
3.64 88.07 6.07 tcg_optimize
3.27 93.53 5.46 temp_save
2.43 97.59 4.06 tcg_reg_alloc_op
2.37 101.54 3.95 compute_all_taddtv
2.24 105.27 3.74 compute_C_sub_xcc
2.22 108.98 3.71 tcg_liveness_analysis
2.00 112.32 3.34 compute_all_flags
1.68 115.13 2.81 tlb_flush
Here it looks like "compute_all_sub" and "compute_all_sub_xcc" are
good candidates for optimizing: together they take the same amount of
time as cpu_sparc_exec. I guess both operations would be trivial in
the x86_64 assembler. What would be the best strategy to make TCG take
the advantage of running on a x86_64 host?
--
Regards,
Artyom Tarasenko
solaris/sparc under qemu blog: http://tyom.blogspot.com/search/label/qemu
- [Qemu-devel] profiling qemu,
Artyom Tarasenko <=