[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 4/4] tcg/aarch64: implement tlb lookup fast path
From: |
Jani Kokkonen |
Subject: |
Re: [Qemu-devel] [PATCH 4/4] tcg/aarch64: implement tlb lookup fast path |
Date: |
Mon, 3 Jun 2013 13:21:55 +0200 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 |
On 5/31/2013 10:25 PM, Richard Henderson wrote:
> On 05/31/2013 11:07 AM, Jani Kokkonen wrote:
>> +/* Load and compare a TLB entry, leaving the flags set. Leaves X2 pointing
>> + to the tlb entry. Clobbers X0,X1,X2,X3 and TMP. */
>> +
>> +static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
>> + int s_bits, uint8_t **label_ptr, int
>> tlb_offset)
>> +{
>
> You copied the comment from ARM, and it isn't correct. You generate branches.
I will fix the comment.
>
>> + TCGReg base = TCG_AREG0;
>> +
>> + tcg_out_shr(s, 1, TCG_REG_TMP, addr_reg, TARGET_PAGE_BITS);
>> + tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X1, tlb_offset);
>> + tcg_out_arith(s, ARITH_ADD, 1, TCG_REG_X2, base, TCG_REG_X1, 0);
>> + tcg_out_andi(s, 1, TCG_REG_X0, TCG_REG_TMP, CPU_TLB_BITS, 0);
>> + tcg_out_arith(s, ARITH_ADD, 1, TCG_REG_X2, TCG_REG_X2,
>> + TCG_REG_X0, -CPU_TLB_ENTRY_BITS);
>> +#if TARGET_LONG_BITS == 64
>> + tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X3, TCG_REG_X2, 0);
>> +#else
>> + tcg_out_ldst(s, LDST_32, LDST_LD, TCG_REG_X3, TCG_REG_X2, 0);
>> +#endif
>> + /* check alignment */
>> + if (s_bits) {
>> + tcg_out_tst(s, 1, addr_reg, s_bits, 0);
>> + label_ptr[0] = s->code_ptr;
>> + tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
>> + }
>> + tcg_out_cmp(s, 1, TCG_REG_X3, TCG_REG_TMP, -TARGET_PAGE_BITS);
>> + label_ptr[1] = s->code_ptr;
>> + tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
>
> I'm positive that the branch predictor would be happier with a single branch
> rather than the two you generate here. It ought to be possible to use a
> different set of insns to do this in one go.
>
> How about something like
>
> @ extract the tlb index from the address
> ubfm w0, addr_reg, TARGET_PAGE_BITS, CPU_TLB_BITS
>
> @ add any "high bits" from the tlb offset
> @ noting that env will be much smaller than 24 bits.
> add x1, env, tlb_offset & 0xfff000
>
> @ zap the tlb index from the address for compare
> @ this is all high bits plus 0-3 low bits set, so this
> @ should match a logical immediate.
> and w/x2, addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1)
>
> @ merge the tlb index into the env+tlb_offset
> add x1, x1, x0, lsl #3
>
> @ load the tlb comparator. the 12-bit scaled offset
> @ form will fit the bits remaining from above, given that
> @ we're loading an aligned object, and so the low 2/3 bits
> @ will be clear.
> ldr w/x0, [x1, tlb_offset & 0xfff]
>
> @ load the tlb addend. do this early to avoid stalling.
> @ the addend_offset differs from tlb_offset by 1-3 words.
> @ given that we've got overlap between the scaled 12-bit
> @ value and the 12-bit shifted value above, this also ought
> @ to always be representable.
> ldr x3, [x1, (tlb_offset & 0xfff) + (addend_offset - tlb_offset)]
>
> @ perform the comparison
> cmp w/x0, w/x2
>
> @ generate the complete host address in parallel with the cmp.
> add x3, x3, addr_reg @ 64-bit guest
> add x3, x3, addr_reg, uxtw @ 32-bit guest
>
> bne miss_label
>
> Note that the w/x above indicates the ext setting that ought to be used,
> depending on the address size of the guest.
>
> This is at least 2 insns shorter than your sequence.
Ok, thanks. ubfm instruction will be added and I will modify implementation
based on your comments.
>
> Have you looked at doing the out-of-line tlb miss sequence right from the
> very beginning? It's not that much more difficult to accomplish than the
> inline tlb miss.
I have to look into this one.
>
> See CONFIG_QEMU_LDST_OPTIMIZATION, and the implementation in tcg/arm.
> You won't need two nops after the call; aarch64 can do all the required
> extensions and data movement operations in a single insn.
>
>
I will take this also into account.
> r~
>
-Jani
- Re: [Qemu-devel] [PATCH 4/4] tcg/aarch64: implement tlb lookup fast path,
Jani Kokkonen <=