[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 7/7] tcg: Streamline movcond_i64 using movcond_i
From: |
Aurelien Jarno |
Subject: |
Re: [Qemu-devel] [PATCH 7/7] tcg: Streamline movcond_i64 using movcond_i32 |
Date: |
Fri, 21 Sep 2012 23:23:20 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Sep 21, 2012 at 10:13:40AM -0700, Richard Henderson wrote:
> When movcond_i32 is available we can further reduce the generated
> op count from 12 to 6, and the generated code size on i686 from
> 88 to 74 bytes.
>
> Signed-off-by: Richard Henderson <address@hidden>
> ---
> tcg/tcg-op.h | 22 +++++++++++++++-------
> 1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 3e375ea..0145a09 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -2147,16 +2147,24 @@ static inline void tcg_gen_movcond_i64(TCGCond cond,
> TCGv_i64 ret,
> tcg_gen_op6i_i32(INDEX_op_setcond2_i32, t0,
> TCGV_LOW(c1), TCGV_HIGH(c1),
> TCGV_LOW(c2), TCGV_HIGH(c2), cond);
> - tcg_gen_neg_i32(t0, t0);
>
> - tcg_gen_and_i32(t1, TCGV_LOW(v1), t0);
> - tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0);
> - tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1);
> + if (TCG_TARGET_HAS_movcond_i32) {
> + tcg_gen_movi_i32(t1, 0);
> + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_LOW(ret), t0, t1,
> + TCGV_LOW(v1), TCGV_LOW(v2));
> + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_HIGH(ret), t0, t1,
> + TCGV_HIGH(v1), TCGV_HIGH(v2));
> + } else {
> + tcg_gen_neg_i32(t0, t0);
>
> - tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0);
> - tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0);
> - tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1);
> + tcg_gen_and_i32(t1, TCGV_LOW(v1), t0);
> + tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0);
> + tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1);
>
> + tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0);
> + tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0);
> + tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1);
> + }
> tcg_temp_free_i32(t0);
> tcg_temp_free_i32(t1);
> } else {
At some point I tried to think how to implement movcond_i64 for
MIPS directly in the backend. I just tried your patch, and I got
this kind of code:
| 0x2bb2ae58: sltu at,zero,s4
| 0x2bb2ae5c: sltu t0,zero,s3
| 0x2bb2ae60: or s3,at,t0
| 0x2bb2ae64: movz s1,s5,s3
| 0x2bb2ae68: movz s2,s6,s3
|
| (in some cases some constants/globals loading appear in the middle, but
| that's not due to movcond).
It's basically the kind of code I would have written. It's clearly
better to implement it directly in TCG.
Now I wonder if it wouldn't be better to write brcond2 as setcond2 +
brcond. And even setcond2 as a pair of setcond in TCG, which would allow
some optimizations in case both high parts are zero.
Tested-by: Aurelien Jarno <address@hidden>
Reviewed-by: Aurelien Jarno <address@hidden>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
address@hidden http://www.aurel32.net
- [Qemu-devel] [PATCH v2 0/7] tcg: movcond, Richard Henderson, 2012/09/21
- [Qemu-devel] [PATCH 2/7] target-alpha: Use movcond, Richard Henderson, 2012/09/21
- [Qemu-devel] [PATCH 5/7] tcg: Optimize two-address commutative operations, Richard Henderson, 2012/09/21
- [Qemu-devel] [PATCH 6/7] tcg: Streamline movcond_i64 using 32-bit arithmetic, Richard Henderson, 2012/09/21
- Re: [Qemu-devel] [PATCH v2 0/7] tcg: movcond, Aurelien Jarno, 2012/09/21
- Re: [Qemu-devel] [PATCH v2 0/7] tcg: movcond (ppc32 version), malc, 2012/09/21