qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Discuss] Qemu TCG-IR VS LLVM IR


From: Alex Bennée
Subject: Re: [Qemu-devel] [Discuss] Qemu TCG-IR VS LLVM IR
Date: Fri, 13 Jun 2014 15:56:12 +0100

Chaos Shu writes:

> Hi all
>
> Recently I am investigating is there better BT solution? I got two kinds of
> popular method. 
<snip>
> According to their finally test[1][2]. Seems that LLVM IR method is slower
> than Qemu's TCG-IR. But according last reply from linaro engineer once work
> in Transitive, the QuickTransit is much better in performance, it uses IR
> and DAG just as LLVM IR does.

I think your focusing too much on one aspect of the design differences
of the two translators. While IR based approaches do make some things
easier they introduce other problems you need to solve.

Typically when you build a DAG you get automatic dead code elimination
if say a register is defined with a new value having not been used for
something else. Suddenly you need a mechanism to deal resolving
exception state for signals that arrived between the two definitions. In
QEMU the block is just re-translated without any optimisation.

Just because QEMU doesn't use IR doesn't mean it can't optimise the
operations - the result might be a little less elegant but it can do it.
Even then you need to ask yourself if changing the entire tcg engine
gains you enough. Looking at a quick perf dump on my current work:

37.72%  perf-28173.map               [.] 0x00007fd52184181a
18.25%  qemu-system-aarch64          [.] cpu_arm_exec
 7.80%  qemu-system-aarch64          [.] phys_page_find
 4.54%  qemu-system-aarch64          [.] get_phys_addr_lpae
 3.72%  qemu-system-aarch64          [.] address_space_translate_internal
 3.35%  qemu-system-aarch64          [.] address_space_translate
 2.11%  qemu-system-aarch64          [.] tlb_set_page
 ...

So less than 50% of the time is spent in translated code. This suggests
there are plenty of other places we could look for performance
improvements, that's before we talk about tackling things like safely
using threads and utilising more than one core on TCG based system
emulation. That 37% figure isn't overly helpful either. We need to look
at what the break down is for hot-blocks (the 80/20 rule) and if the
current tcg can improve.

> And what's more, I found result from ICT/Loongson, they work on Qemu-TCG
> years and opt on IR and devote much to hardware register mapping and
> peephole-like opt on generated code after TCG, and finally seems to get a
> good-ending.

Don't misunderstand me these llvm experiments are very interesting and
offer potential avenues to explore. But if you really want to want to
the compare the approaches I suspect it would be better to build an IR
based translator from scratch with some thought to design rather than
trying to bolt it on to a different system

> Those two directions, which one is better? I mean which one can be the
> finally product level app in future arm/x86 competition.
>
>  
>
>  
>
> [1]: https://code.google.com/p/llvm-qemu/wiki/Status
>
> [2]:
> http://infoscience.epfl.ch/record/149975/files/x86-llvm-translator-chipounov
> _2.pdf 

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]