qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for a


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64
Date: Mon, 13 May 2013 12:49:12 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130402 Thunderbird/17.0.5

On 05/13/2013 06:33 AM, Claudio Fontana wrote:
> +enum aarch64_cond_code {
> +    COND_EQ = 0x0,
> +    COND_NE = 0x1,
> +    COND_CS = 0x2,   /* Unsigned greater or equal */
> +    COND_HS = 0x2,      /* ALIAS greater or equal */

Clearer to define aliases as COND_HS = COND_CS.

> +static inline void tcg_out_movi64(TCGContext *s, int rd, uint64_t value)
> +{
> +    uint32_t half, base, movk = 0, shift = 0;
> +    if (!value) {
> +        tcg_out_movr(s, 1, rd, TCG_REG_XZR);
> +        return;
> +    }
> +    /* construct halfwords of the immediate with MOVZ with LSL */
> +    /* using MOVZ 0x52800000 | extended reg.. */
> +    base = 0xd2800000;
> +
> +    while (value) {
> +        half = value & 0xffff;
> +        if (half) {
> +            /* Op can be MOVZ or MOVK */
> +            tcg_out32(s, base | movk | shift | half << 5 | rd);
> +            if (!movk)
> +                movk = 0x20000000; /* morph next MOVZs into MOVKs */
> +        }
> +        value >>= 16;
> +        shift += 0x00200000;

You'll almost certainly want to try ADP+ADD before decomposing into 3-4 mov[zk]
instructions.

> +void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
> +{
> +    tcg_target_long target, offset;
> +    target = (tcg_target_long)addr;
> +    offset = (target - (tcg_target_long)jmp_addr) / 4;
> +
> +    if (offset <= -0x02000000 || offset >= 0x02000000) {
> +        /* out of 26bit range */
> +        tcg_abort();
> +    }

See MAX_CODE_GEN_BUFFER_SIZE in translate-all.c.  Set this value to 128MB and
then all cross-TB branches will be in range, and the abort won't trigger.

> +static inline void tcg_out_goto_label_cond(TCGContext *s, TCGCond c, int 
> label_index)
> +{
> +    tcg_target_long offset;
> +    /* backward conditional jump never seems to happen in practice,
> +       so just always use the branch trampoline */
> +    c = tcg_invert_cond(c);
> +    offset = 2; /* skip current instr and the next */
> +    tcg_out_goto_cond(s, c, offset);
> +    tcg_out_goto_label(s, label_index); /* emit 26bit jump */
> +}

Conditional branch range is +-1MB.  You'll never see a TB that large.  You
don't need to emit a branch-across-branch.

> +    /* Should generate something like the following:
> +     *  shr x8, addr_reg, #TARGET_PAGE_BITS
> +     *  and x0, x8, #(CPU_TLB_SIZE - 1)   @ Assumption: CPU_TLB_BITS <= 8
> +     *  add x0, env, x0 lsl #CPU_TLB_ENTRY_BITS
> +     */
> +#  if CPU_TLB_BITS > 8
> +#   error "CPU_TLB_BITS too large"
> +#  endif

I wonder if using UBFM to extract the TLB bits and BFM with XZR to clear the
middle bits wouldn't be better, as you wouldn't be restricted on the size of
CPU_TLB_BITS.  AFAICS it would be the same number of instructions.

> +    case INDEX_op_mov_i64: ext = 1;
> +    case INDEX_op_mov_i32:
> +        tcg_out_movr(s, ext, args[0], args[1]);
> +        break;

See how the i386 backend uses macros to reduce the typing with these sorts of
paired opcodes.

> +    case INDEX_op_rotl_i64: ext = 1;
> +    case INDEX_op_rotl_i32:     /* same as rotate right by (32 - m) */
> +        if (const_args[2])      /* ROR / EXTR Wd, Wm, Wm, 32 - m */
> +            tcg_out_rotl(s, ext, args[0], args[1], args[2]);
> +        else { /* no RSB in aarch64 unfortunately. */
> +            /* XXX UNTESTED */
> +            tcg_out_movi32(s, ext, TCG_REG_X8, ext ? 64 : 32);

But A64 does have shift counts that truncate to the width of the operation.
Which means that the high bits may contain garbage, which means that you can
compute this merely as ROR = -ROL, ignoring the 32/64.

> +    case INDEX_op_setcond_i64: ext = 1;
> +    case INDEX_op_setcond_i32:
> +        tcg_out_movi32(s, ext, TCG_REG_X8, 0x01);
> +        tcg_out_cmp(s, ext, args[1], args[2]);
> +        tcg_out_csel(s, ext, args[0], TCG_REG_X8, TCG_REG_XZR,
> +                     tcg_cond_to_aarch64_cond[args[3]]);

See CSINC Wd,Wzr,Wzr,cond.  No need for the initial movi.

> +    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffff);
> +    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffff);

Only half of your registers are marked available.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]