qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] save compiled qemu traces.


From: Xin Tong
Subject: Re: [Qemu-devel] save compiled qemu traces.
Date: Thu, 12 Dec 2013 13:51:24 +0900

On Thu, Dec 12, 2013 at 1:07 PM, Xin Tong <address@hidden> wrote:
> see questions below.
>
> On Tue, Dec 10, 2013 at 12:25 AM, Alex Bennée <address@hidden> wrote:
>>
>> address@hidden writes:
>>
>>> Does anyone have profiles on how much time QEMU spends in translating
>>> instructions. QEMU does not have a baseline interpreter nor does it
>>> translate on trace-granularity.  so i imagine QEMU must spend quite a bit
>>> of time translating instructions.
>>
>> Not as much as you'd think. The translation stage isn't very complex and
>> blocks only get translated once (modulo exceptions and self modifying
>> code). If you run perf on your task you should see most of the time is
>> spent in the generated code - if not please send the test case to the
>> list.
>
> I took a profile running speccpu2006 403.gcc with test input on a
> intel xeon machine. we only spent 44.76% of the time in the code cache
> (i.e. 13M ticks in the code cache), while 40.97% of the time is spent
> in the qemu-system-x86_64. some of the hot functions in
> qemu-system-x86_64 are listed below.
>
> *you are right* we do not spend much time in translation routines.
> instead we spend significant amount of time in address translation
> code.
>
> CPU_CLK_UNHALTED %     Symbol/Functions
> 1340512         100.00 anon (tgid:7106 range:0x7f97815ca000-0x7f979a692000)
>
>
> CPU_CLK_UNHALTED %     Symbol/Functions
> 314655           25.64 address_space_translate_internal
> 308942           25.18 cpu_x86_exec
> 128922           10.51 ldq_phys
> 92345           7.53 cpu_x86_handle_mmu_fault
> 62456           5.09 tlb_set_page
> 49332           4.02 memory_region_is_ram
> 31055           2.53 helper_le_ldq_mmu
> 22048           1.80 memory_region_get_ram_addr
> 19223           1.57 memory_region_section_get_iotlb
> 15873           1.29 tcg_optimize
> 14526           1.18 get_page_addr_code
> 12601           1.03 memory_region_get_ram_ptr

However, being able to reuse cached blocks based on content in QEMU
maybe a step towards reusing translated blocks across multiple
invocations of QEMU.
>
> Xin
>
>
>>
>> I suspect the more useful statistic would be getting a break down of the
>> translation blocks and seeing which ones are the most heavily used and
>> examining if QEMU has done as good a job as it can of translating them.
>>
>>> Is it possible for QEMU to obviate some of the translations by attaching a
>>> signature (e.g. a hash) with every translated basic block and try to reuse
>>> translated basic block based on the signature as much as possible ? Reuses
>>> can be a result of rerunning programs or same libraries statically linked
>>> to programs.
>>
>> Your right a translation cache *could* save some translation time,
>> especially if you end up translating the same program over and over
>> again. Having said that you might find the cost of computing the
>> checksum obviates any speed-up from skipping the translation. After all
>> QEMU only needs to look at each subject instruction once normally.
>>
>> Using QEMU  linux-user for cross building would be the obvious pain
>> point. However as the usual use case is building for embedded platforms
>> most users are just happy to fully utilise their 80-core build machines
>> in preference to having a farm of slow embedded processors.
>>
>>> This could end up saving some translation time.
>>
>> I think you would need to do some performance analysis and come up with
>> some numbers before you made that assumption.
>>
>> Cheers,
>>
>> --
>> Alex Bennée
>> QEMU/KVM Hacker for Linaro
>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]