[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v4 26/33] tcg-aarch64: Avoid add with zero in tlb lo
From: |
Richard Henderson |
Subject: |
[Qemu-devel] [PATCH v4 26/33] tcg-aarch64: Avoid add with zero in tlb load |
Date: |
Sat, 14 Sep 2013 14:54:43 -0700 |
Some guest env are small enough to reach the tlb with only a 12-bit addition.
Signed-off-by: Richard Henderson <address@hidden>
---
tcg/aarch64/tcg-target.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5691cc3..1905271 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1136,46 +1136,58 @@ static void add_qemu_ldst_label(TCGContext *s, int
is_ld, int opc,
slow path for the failure case, which will be patched later when finalizing
the slow path. Generated code returns the host addend in X1,
clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
- int s_bits, uint8_t **label_ptr, int mem_index, int is_read)
+static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, int s_bits,
+ uint8_t **label_ptr, int mem_index, int is_read)
{
TCGReg base = TCG_AREG0;
int tlb_offset = is_read ?
offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
: offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
+
/* Extract the TLB index from the address into X0.
X0<CPU_TLB_BITS:0> =
addr_reg<TARGET_PAGE_BITS+CPU_TLB_BITS:TARGET_PAGE_BITS> */
tcg_out_ubfm(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, addr_reg,
TARGET_PAGE_BITS, TARGET_PAGE_BITS + CPU_TLB_BITS);
+
/* Store the page mask part of the address and the low s_bits into X3.
Later this allows checking for equality and alignment at the same time.
X3 = addr_reg & (PAGE_MASK | ((1 << s_bits) - 1)) */
tcg_fmt_Rdn_limm(s, INSN_ANDI, TARGET_LONG_BITS == 64, TCG_REG_X3,
addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+
/* Add any "high bits" from the tlb offset to the env address into X2,
to take advantage of the LSL12 form of the ADDI instruction.
X2 = env + (tlb_offset & 0xfff000) */
- tcg_fmt_Rdn_aimm(s, INSN_ADDI, 1, TCG_REG_X2, base, tlb_offset & 0xfff000);
+ if (tlb_offset & 0xfff000) {
+ tcg_fmt_Rdn_aimm(s, INSN_ADDI, 1, TCG_REG_X2, base,
+ tlb_offset & 0xfff000);
+ base = TCG_REG_X2;
+ }
+
/* Merge the tlb index contribution into X2.
- X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
- tcg_fmt_Rdnm_lsl(s, INSN_ADD, 1, TCG_REG_X2, TCG_REG_X2,
+ X2 = base + (X0 << CPU_TLB_ENTRY_BITS) */
+ tcg_fmt_Rdnm_lsl(s, INSN_ADD, 1, TCG_REG_X2, base,
TCG_REG_X0, CPU_TLB_ENTRY_BITS);
+
/* Merge "low bits" from tlb offset, load the tlb comparator into X0.
X0 = load [X2 + (tlb_offset & 0x000fff)] */
tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
LDST_LD, TCG_REG_X0, TCG_REG_X2,
(tlb_offset & 0xfff));
+
/* Load the tlb addend. Do that early to avoid stalling.
X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
(tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
(is_read ? offsetof(CPUTLBEntry, addr_read)
: offsetof(CPUTLBEntry, addr_write)));
+
/* Perform the address comparison. */
tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0);
- *label_ptr = s->code_ptr;
+
/* If not equal, we jump to the slow path. */
+ *label_ptr = s->code_ptr;
tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
}
--
1.8.3.1
- [Qemu-devel] [PATCH v4 17/33] tcg-aarch64: Support deposit, (continued)
- [Qemu-devel] [PATCH v4 17/33] tcg-aarch64: Support deposit, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 18/33] tcg-aarch64: Support add2, sub2, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 19/33] tcg-aarch64: Support muluh, mulsh, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 20/33] tcg-aarch64: Support div, rem, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 21/33] tcg-aarch64: Introduce tcg_fmt_Rd_uimm, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 22/33] tcg-aarch64: Use MOVN in tcg_out_movi, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 23/33] tcg-aarch64: Use ORRI in tcg_out_movi, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 24/33] tcg-aarch64: Special case small constants in tcg_out_movi, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 25/33] tcg-aarch64: Use adrp in tcg_out_movi, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 26/33] tcg-aarch64: Avoid add with zero in tlb load,
Richard Henderson <=
- [Qemu-devel] [PATCH v4 27/33] tcg-aarch64: Pass return address to load/store helpers directly., Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 28/33] tcg-aarch64: Use tcg_out_call for qemu_ld/st, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 30/33] tcg-aarch64: Implement tcg_register_jit, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 29/33] tcg-aarch64: Use symbolic names for branches, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 31/33] tcg-aarch64: Reuse FP and LR in translated code, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 32/33] tcg-aarch64: Introduce tcg_out_ldst_pair, Richard Henderson, 2013/09/14
- [Qemu-devel] [PATCH v4 33/33] tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check, Richard Henderson, 2013/09/14