[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] profiling qemu
From: |
Laurent Desnogues |
Subject: |
Re: [Qemu-devel] profiling qemu |
Date: |
Tue, 14 Feb 2012 15:47:58 +0100 |
2012/2/14 Lluís Vilanova <address@hidden>:
> Artyom Tarasenko writes:
[...]
>> Here it looks like "compute_all_sub" and "compute_all_sub_xcc" are
>> good candidates for optimizing: together they take the same amount of
>> time as cpu_sparc_exec. I guess both operations would be trivial in
>> the x86_64 assembler. What would be the best strategy to make TCG take
>> the advantage of running on a x86_64 host?
>
> A quick look into the code reveals that these two are called from a TCG helper
> (helper_compute_psr), so I see two approaches here applicable to the most
> frequently used "sub-operations" in helper_compute_psr:
>
> * Define new simpler helpers for those sub-operations that can be declared
> with
> TCG_CALL_CONST and generate the new psr/xcc values in temporal registers. You
> must make sure any other code will still be able to use the new psr/xcc
> values.
>
> * Reimplement these sub-operations in pure TCG code.
>
>
> But first, make sure you run a proper benchmark to establish where are the
> hotspots in the sparc code for QEMU. The problem here is to establish what a
> proper benchmark is :)
Similar helpers are used in ARM translation, so I'm not surprised
they show up (typically sub/flag instructions are used for loops).
A good strategy is indeed to generate TCG code and let the
NZ/C/etc. be global temps as other CPU registers. This gains a
few percents of speed.
HTH,
Laurent