qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 2/3] tcg/aarch64: Use ADRP+ADD to compute tar


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v2 2/3] tcg/aarch64: Use ADRP+ADD to compute target address
Date: Thu, 29 Jun 2017 09:36:47 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0

On 06/29/2017 12:52 AM, Pranith Kumar wrote:
We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.

CC: Richard Henderson <address@hidden>
CC: Alex Bennée <address@hidden>
Signed-off-by: Pranith Kumar <address@hidden>
---
  accel/tcg/translate-all.c    |  2 +-
  tcg/aarch64/tcg-target.inc.c | 26 +++++++++++++++++++++-----
  2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f6ad46b613..b6d122e087 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -522,7 +522,7 @@ static inline PageDesc *page_find(tb_page_addr_t index)
  #elif defined(__powerpc__)
  # define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
  #elif defined(__aarch64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)

The max is 2GB, because the 4GB range of ADRP is signed.
The end of the buffer must be able to address the beginning of the buffer.

  void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
  {
      tcg_insn_unit *code_ptr = (tcg_insn_unit *)jmp_addr;
-    tcg_insn_unit *target = (tcg_insn_unit *)addr;
+    tcg_insn_unit adrp_insn = *code_ptr++;
+    tcg_insn_unit addi_insn = *code_ptr;
- reloc_pc26_atomic(code_ptr, target);
-    flush_icache_range(jmp_addr, jmp_addr + 4);
+    ptrdiff_t offset = (addr >> 12) - (jmp_addr >> 12);
+
+    /* patch ADRP */
+    adrp_insn = deposit32(adrp_insn, 29, 2, offset & 0x3);
+    adrp_insn = deposit32(adrp_insn, 5, 19, offset >> 2);
+    /* patch ADDI */
+    addi_insn = deposit32(addi_insn, 10, 12, addr & 0xfff);
+    atomic_set((uint64_t *)jmp_addr, adrp_insn | ((uint64_t)addi_insn << 32));
+    flush_icache_range(jmp_addr, jmp_addr + 8);

(1) You don't need to load the ADRP and ADDI insns, because you know exactly what they are. (2) You should check to see if the branch is within 128MB and use a direct branch in that case; it will happen quite often. See the ppc64 ppc_tb_set_jmp_target for an example.

          /* actual branch destination will be patched by
+           aarch64_tb_set_jmp_target later, beware of retranslation */

We can actually drop the retranslation comment now; that's a cleanup that should be applied to all of the backends...

+        tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
+        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, TCG_REG_TMP, 0);
+        tcg_out_callr(s, TCG_REG_TMP);

Don't use callr, use BR like you did for goto_long in the previous patch.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]