[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 4/4] tcg/aarch64: implement tlb lookup fast path
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH 4/4] tcg/aarch64: implement tlb lookup fast path |
Date: |
Fri, 31 May 2013 13:25:26 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6 |
On 05/31/2013 11:07 AM, Jani Kokkonen wrote:
> +/* Load and compare a TLB entry, leaving the flags set. Leaves X2 pointing
> + to the tlb entry. Clobbers X0,X1,X2,X3 and TMP. */
> +
> +static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
> + int s_bits, uint8_t **label_ptr, int tlb_offset)
> +{
You copied the comment from ARM, and it isn't correct. You generate branches.
> + TCGReg base = TCG_AREG0;
> +
> + tcg_out_shr(s, 1, TCG_REG_TMP, addr_reg, TARGET_PAGE_BITS);
> + tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X1, tlb_offset);
> + tcg_out_arith(s, ARITH_ADD, 1, TCG_REG_X2, base, TCG_REG_X1, 0);
> + tcg_out_andi(s, 1, TCG_REG_X0, TCG_REG_TMP, CPU_TLB_BITS, 0);
> + tcg_out_arith(s, ARITH_ADD, 1, TCG_REG_X2, TCG_REG_X2,
> + TCG_REG_X0, -CPU_TLB_ENTRY_BITS);
> +#if TARGET_LONG_BITS == 64
> + tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X3, TCG_REG_X2, 0);
> +#else
> + tcg_out_ldst(s, LDST_32, LDST_LD, TCG_REG_X3, TCG_REG_X2, 0);
> +#endif
> + /* check alignment */
> + if (s_bits) {
> + tcg_out_tst(s, 1, addr_reg, s_bits, 0);
> + label_ptr[0] = s->code_ptr;
> + tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
> + }
> + tcg_out_cmp(s, 1, TCG_REG_X3, TCG_REG_TMP, -TARGET_PAGE_BITS);
> + label_ptr[1] = s->code_ptr;
> + tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
I'm positive that the branch predictor would be happier with a single branch
rather than the two you generate here. It ought to be possible to use a
different set of insns to do this in one go.
How about something like
@ extract the tlb index from the address
ubfm w0, addr_reg, TARGET_PAGE_BITS, CPU_TLB_BITS
@ add any "high bits" from the tlb offset
@ noting that env will be much smaller than 24 bits.
add x1, env, tlb_offset & 0xfff000
@ zap the tlb index from the address for compare
@ this is all high bits plus 0-3 low bits set, so this
@ should match a logical immediate.
and w/x2, addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1)
@ merge the tlb index into the env+tlb_offset
add x1, x1, x0, lsl #3
@ load the tlb comparator. the 12-bit scaled offset
@ form will fit the bits remaining from above, given that
@ we're loading an aligned object, and so the low 2/3 bits
@ will be clear.
ldr w/x0, [x1, tlb_offset & 0xfff]
@ load the tlb addend. do this early to avoid stalling.
@ the addend_offset differs from tlb_offset by 1-3 words.
@ given that we've got overlap between the scaled 12-bit
@ value and the 12-bit shifted value above, this also ought
@ to always be representable.
ldr x3, [x1, (tlb_offset & 0xfff) + (addend_offset - tlb_offset)]
@ perform the comparison
cmp w/x0, w/x2
@ generate the complete host address in parallel with the cmp.
add x3, x3, addr_reg @ 64-bit guest
add x3, x3, addr_reg, uxtw @ 32-bit guest
bne miss_label
Note that the w/x above indicates the ext setting that ought to be used,
depending on the address size of the guest.
This is at least 2 insns shorter than your sequence.
Have you looked at doing the out-of-line tlb miss sequence right from the
very beginning? It's not that much more difficult to accomplish than the
inline tlb miss.
See CONFIG_QEMU_LDST_OPTIMIZATION, and the implementation in tcg/arm.
You won't need two nops after the call; aarch64 can do all the required
extensions and data movement operations in a single insn.
r~
- [Qemu-devel] [PATCH 0/4] ARM aarch64 TCG tlb fast lookup, Jani Kokkonen, 2013/05/31
- [Qemu-devel] [PATCH 2/4] tcg/aarch64: implement byte swap operations, Jani Kokkonen, 2013/05/31
- [Qemu-devel] [PATCH 1/4] tcg/aarch64: more low level ops in preparation of tlb, lookup, Jani Kokkonen, 2013/05/31
- [Qemu-devel] [PATCH 4/4] tcg/aarch64: implement tlb lookup fast path, Jani Kokkonen, 2013/05/31
- Re: [Qemu-devel] [PATCH 4/4] tcg/aarch64: implement tlb lookup fast path,
Richard Henderson <=
- [Qemu-devel] [PATCH 3/4] tcg/aarch64: implement sign/zero extend operations, Jani Kokkonen, 2013/05/31