[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4
From: |
Laurent Desnogues |
Subject: |
Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4 |
Date: |
Wed, 23 Dec 2009 11:28:58 +0100 |
On Tue, Dec 22, 2009 at 3:46 PM, Laurent Desnogues
<address@hidden> wrote:
> On Tue, Dec 22, 2009 at 1:02 AM, Richard Henderson <address@hidden> wrote:
>> On 12/21/2009 03:08 PM, Laurent Desnogues wrote:
>>>
>>> If you wanted to use movcond, you'd have to make
>>> cond + move a special case...
>>
>> You'd certainly want the ARM front-end to use movcond more often than that.
>> For instance:
>>
>> addeq r1,r2,r3
>> -->
>> add_i32 tmp,r2,r3
>> movcond_i32 r1,ZF,0,tmp,r1,eq
>>
>> You'd want to continue to use a branch around if the instruction has side
>> effects like cpu fault (e.g. load, store) or updating flags.
>>
>> It ought not be very hard to arrange for something like
>>
>> if (cond != 0xe) {
>> if (may_use_movcond(insn)) {
>> s->condlabel = -1;
>> /* Save the true destination register. */
>> s->conddest = cpu_R[dest];
>> /* Implement the instruction into a temporary. */
>> cpu_R[dest] = tcg_temp_new();
>> } else {
>> s->condlabel = gen_new_label();
>> ArmConditional cmp = gen_test_cc(cond ^ 1);
>> tcg_gen_brcondi_i32(cmp.cond, cmp.reg, 0, s->condlabel);
>> }
>> s->condjmp = 1;
>> }
>>
>> // ... implement the instruction as we currently do.
>>
>> if (s->condjmp) {
>> if (s->condlabel == -1) {
>> /* Conditionally move the temporary result into the
>> true destination register. */
>> ArmConditional cmp = gen_test_cc(cond);
>> tcg_gen_movcond_i32(cmp.cond, s->conddest, cmp.reg, 0,
>> cpu_R[dest], s->conddest);
>> tcg_temp_free(cpu_R[dest]);
>> /* Restore the true destination register. */
>> cpu_R[dest] = s->conddest;
>> } else {
>> tcg_set_label(d->condlabel);
>> }
>> }
>
> I agree, that looks nice. But I'll let you dig into ARM instruction
> encoding and see how to implement may_use_movcond and
> getting the correct dest to save is not that cheap (and before
> you get back to me, yes, you could only consider a small
> subset of the instructions for which you want to do that :-).
>
> There's a point I have kept on insisting on that you keep on
> not answering :-) How does all of that perform in practice?
> We can discuss forever, as long as it isn't measured, we are
> just guessing.
So I did measure it. Your code isn't correct: you can't replace
the dest reg with a new temp since that would break instructions
such as:
addeq r0,r0,#1
All conditional data processing processing instructions are
using movcond.
Note my version of QEMU is ARM specific and contains
several things that aren't in mainline:
- Aurelien TCG optimizations (constant propagation and
copy analysis)
- lazy block context flags update
- no temp flush on ld/st
- most helpers for non-SIMD/VFP instructions are
replaced with TCG code (using setcond for flag setting)
- no signal handling.
This version of qemu-arm is about 2x faster than mainline.
Env:
- HW: E6400
- OS: CentOS 5.4 64-bit
- gcc: 4.1.2
- bench: SPEC2k gcc with expr.i input set
With movcond:
Translation buffer state:
gen code size 4262752/33449984
TB count 35084/524288
TB avg target size 18 max=592 bytes
TB avg host size 121 bytes (expansion ratio: 6.7)
cross page TB count 0 (0%)
direct jump count 20388 (58%) (2 jumps=17211 49%)
Statistics:
TB invalidate count 0
JIT cycles 628700508 (0.262 s at 2.4 GHz)
translated TBs 35084
avg ops/TB 28.2 max=554
deleted ops/TB 5.09 (178672)
avg temps/TB 28.88 max=54
total in TB size 639924 avg 18.2
total out TB size 4009329 avg 114.3
cycles/op 635.1
cycles/in byte 982.5
cycles/out byte 156.8
gen_interm time 13.2%
gen_code time 86.8%
const/code time 9.9%
liveness/code time 13.9%
real 0m15.944s
user 0m15.512s
sys 0m0.070s
Without movcond:
Translation buffer state:
gen code size 4308640/33449984
TB count 35093/524288
TB avg target size 18 max=592 bytes
TB avg host size 122 bytes (expansion ratio: 6.7)
cross page TB count 0 (0%)
direct jump count 20388 (58%) (2 jumps=17211 49%)
Statistics:
TB invalidate count 0
JIT cycles 673085430 (0.280 s at 2.4 GHz)
translated TBs 35093
avg ops/TB 27.9 max=556
deleted ops/TB 4.79 (168080)
avg temps/TB 28.77 max=34
total in TB size 640804 avg 18.3
total out TB size 4056125 avg 115.6
cycles/op 686.8
cycles/in byte 1050.4
cycles/out byte 165.9
gen_interm time 12.8%
gen_code time 87.2%
const/code time 9.3%
liveness/code time 11.5%
real 0m15.974s
user 0m15.586s
sys 0m0.060s
Notes:
- the change in number of TB's is expected for gcc
since it outputs timing stats at the end
- the cycles spent in TCG are very inaccurate (I never
found them very useful...)
- I slightly changed the movcond x86_64 generation
not to generate useless mov when dest = vtrue = valse
which happened at least once (due to copy analysis).
All in all not much gain.
For the sake of completeness some stats about movcond usage:
number of TB with movcond : 2919
max number of movcond in a TB : 24
total number of movcond generated: 5738
total number of movcond executed : 163892368
Of course that's a single point, but one that is spending a rather
big percentage of time its time in generated code; oprofile output:
21403 60.4570 anon (tgid:17348 range:0x601be000-0x621bf000)
qemu-arm-32 anon (tgid:17348 range:0x601be000-0x621bf000)
6751 19.0695 qemu-arm-32 qemu-arm-32 cpu_arm_exec
2230 6.2991 anon (tgid:17348 range:0x6224a000-0x6224b000)
qemu-arm-32 anon (tgid:17348 range:0x6224a000-0x6224b000)
303 0.8559 qemu-arm-32 qemu-arm-32 tcg_gen_code
65 0.1836 qemu-arm-32 qemu-arm-32 cpu_loop
61 0.1723 qemu-arm-32 qemu-arm-32
page_check_range
53 0.1497 libpthread-2.5.so qemu-arm-32
__pthread_cleanup_upto
49 0.1384 qemu-arm-32 qemu-arm-32 temp_save
36 0.1017 qemu-arm-32 qemu-arm-32
gen_intermediate_code
Doesn't look encouraging, but I like the reduction in generated
code size.
Of course before drawing any conclusion, we need more
measures, especially one with some QEMU system.
Laurent
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, (continued)
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Richard Henderson, 2009/12/20
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Richard Henderson, 2009/12/21
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Laurent Desnogues, 2009/12/21
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Richard Henderson, 2009/12/21
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Laurent Desnogues, 2009/12/21
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Richard Henderson, 2009/12/21
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Laurent Desnogues, 2009/12/22
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4,
Laurent Desnogues <=
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Aurelien Jarno, 2009/12/22
- Re: [Qemu-devel] [PATCH 0/5] tcg conditional set, round 4, Aurelien Jarno, 2009/12/21