[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC PATCH v3 02/34] Hexagon (target/hexagon) README
From: |
Taylor Simpson |
Subject: |
[RFC PATCH v3 02/34] Hexagon (target/hexagon) README |
Date: |
Tue, 18 Aug 2020 10:50:15 -0500 |
Gives an introduction and overview to the Hexagon target
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
target/hexagon/README | 254 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 254 insertions(+)
create mode 100644 target/hexagon/README
diff --git a/target/hexagon/README b/target/hexagon/README
new file mode 100644
index 0000000..3b65688
--- /dev/null
+++ b/target/hexagon/README
@@ -0,0 +1,254 @@
+Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
+processor(DSP).
+
+The following versions of the Hexagon core are supported
+ Scalar core: v67
+
https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
+
+We presented an overview of the project at the 2019 KVM Forum.
+
https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
+
+*** Tour of the code ***
+
+The qemu-hexagon implementation is a combination of qemu and the Hexagon
+architecture library (aka archlib). The three primary directories with
+Hexagon-specific code are
+
+ qemu/target/hexagon
+ This has all the instruction and packet semantics
+ qemu/target/hexagon/imported
+ These files are imported with very little modification from archlib
+ *.idef Instruction semantics definition
+ macros.def Mapping of macros to instruction attributes
+ encode*.def Encoding patterns for each instruction
+ iclass.def Instruction class definitions used to determine
+ legal VLIW slots for each instruction
+ qemu/linux-user/hexagon
+ Helpers for loading the ELF file and making Linux system calls,
+ signals, etc
+
+We start with scripts that generate a bunch of include files. This
+is a two step process. The first step is to use the C preprocessor to expand
+macros inside the architecture definition files. This is done in
+target/hexagon/gen_semantics.c. This step produces
+ <BUILD_DIR>/hexagon-linux-user/semantics_generated.pyinc.
+That file is consumed by the following python scripts to produce the indicated
+header files in <BUILD_DIR>/hexagon-linux-user
+ gen_shortcode.py -> shortcode_generated.h
+ gen_helper_protos.py -> helper_protos_generated.h
+ gen_tcg_funcs.py -> tcg_funcs_generated.h
+ gen_helper_funcs.py -> helper_funcs_generated.h
+
+Qemu helper functions have 3 parts
+ DEF_HELPER declaration indicates the signature of the helper
+ gen_helper_<NAME> will generate a TCG call to the helper function
+ The helper implementation
+
+Here's an example of the A2_add instruction.
+ Instruction tag A2_add
+ Assembly syntax "Rd32=add(Rs32,Rt32)"
+ Instruction semantics "{ RdV=RsV+RtV;}"
+
+By convention, the operands are identified by letter
+ RdV is the destination register
+ RsV, RtV are source registers
+
+The generator uses the operand naming conventions (see large comment in
+hex_common.py) to determine the signature of the helper function. Here are the
+results for A2_add
+
+helper_protos_generated.h
+ #ifndef fGEN_TCG_A2_add
+ DEF_HELPER_3(A2_add, s32, env, s32, s32)
+ #endif
+
+tcg_funcs_generated.h
+ DEF_TCG_FUNC(A2_add, /* { RdV=RsV+RtV;} */
+ {
+ /* A2_add */
+ DECL_RREG_d(RdV, RdN, 0, 0);
+ DECL_RREG_s(RsV, RsN, 1, 0);
+ DECL_RREG_t(RtV, RtN, 2, 0);
+ READ_RREG_s(RsV, RsN);
+ READ_RREG_t(RtV, RtN);
+ #ifdef fGEN_TCG_A2_add
+ fGEN_TCG_A2_add({ RdV=RsV+RtV;});
+ #else
+ do {
+ gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
+ } while (0);
+ #endif
+ WRITE_RREG_d(RdN, RdV);
+ FREE_RREG_d(RdV);
+ FREE_RREG_s(RsV);
+ FREE_RREG_t(RtV);
+ /* A2_add */
+ })
+
+helper_funcs_generated.h
+ #ifndef fGEN_TCG_A2_add
+ int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV)
+ {
+ uint32_t slot __attribute__((unused)) = 4;
+ int32_t RdV = 0;
+ { RdV=RsV+RtV;}
+ return RdV;
+ }
+ #endif
+
+For each operand, there are macros for DECL, FREE, READ, WRITE. These are
+defined in macros.h. Note that we append the operand type to the macro name,
+which allows us to specialize the TCG code tenerated. For read-only operands,
+DECL simply declares the TCGv variable (no need for tcg_temp_local_new()),
+and READ will assign from the TCGv corresponding to the GPR, and FREE doesn't
+have to do anything. Also, note that the WRITE macros update the disassembly
+context to be processed when the packet commits (see "Packet Semantics" below).
+
+Note the fGEN_TCG_A2_add macro. This macro allows us to generate TCG code
+instead of a call to the helper. If defined, the macro takes 1 argument.
+ C semantics (aka short code)
+
+This allows the code generator to override the auto-generated code. In some
+cases this is necessary for correct execution. We can also override for
+faster emulation. For example, calling a helper for add is more expensive
+than generating a TCG add operation.
+
+The gen_tcg.h file has any overrides. For example,
+ #define fGEN_TCG_A2_add(GENHLPR, SHORTCODE) \
+ tcg_gen_add_tl(RdV, RsV, RtV)
+
+The gen_tcg.h file is included twice
+1) In genptr.c, it overrides the semantics from tcg_funcs_generated.h
+2) In helper.h, it prevents the generation of helpers for overridden
+ instructions. Notice the #ifndef fGEN_TCG_A2_add above in both
+ helper_protos_generated.h and helper_functions.h
+
+The instruction semantics C code relies heavily on macros. In cases where the
+C semantics are specified only with macros, we can override the default with
+the short semantics option and #define the macros to generate TCG code. One
+example is L2_loadw_locked:
+ Instruction tag L2_loadw_locked
+ Assembly syntax "Rd32=memw_locked(Rs32)"
+ Instruction semantics "{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) }"
+
+In gen_tcg.h, we use the shortcode
+#define fGEN_TCG_L2_loadw_locked(SHORTCODE) \
+ SHORTCODE
+
+There are also cases where we brute force the TCG code generation.
+Instructions with multiple definitions are examples. These require special
+handling because qemu helpers can only return a single value.
+
+In addition to instruction semantics, we use a generator to create the decode
+tree. This generation is also a two step process. The first step is to run
+target/hexagon/gen_dectree_import.c to produce
+ <BUILD_DIR>/hexagon-linux-user/iset.py
+This file is imported by target/hexagon/dectree.py to produce
+ <BUILD_DIR>/hexagon-linux-user/dectree_generated.h
+
+*** Key Files ***
+
+cpu.h
+
+This file contains the definition of the CPUHexagonState struct. It is the
+runtime information for each thread and contains stuff like the GPR and
+predicate registers.
+
+macros.h
+
+The Hexagon arch lib relies heavily on macros for the instruction semantics.
+This is a great advantage for qemu because we can override them for different
+purposes. You will also notice there are sometimes two definitions of a macro.
+The QEMU_GENERATE variable determines whether we want the macro to generate TCG
+code. If QEMU_GENERATE is not defined, we want the macro to generate vanilla
+C code that will work in the helper implementation.
+
+translate.c
+
+The functions in this file generate TCG code for a translation block. Some
+important functions in this file are
+
+ gen_start_packet - initialize the data structures for packet semantics
+ gen_commit_packet - commit the register writes, stores, etc for a packet
+ decode_packet - disassemble a packet and generate code
+
+genptr.c
+genptr_helpers.h
+gen_tcg.h
+
+These files create a function for each instruction. It is mostly composed of
+fGEN_TCG_<tag> definitions followed by including qemu_def_generated.h. The
+genptr_helpers.h file contains helper functions that are invoked by the macros
+in gen_tcg.h and macros.h
+
+op_helper.c
+
+This file contains the implementations of all the helpers. There are a few
+general purpose helpers, but most of them are generated by including
+qemu_def_generated.h. There are also several helpers used for debugging.
+
+
+*** Packet Semantics ***
+
+VLIW packet semantics differ from serial semantics in that all input operands
+are read, then the operations are performed, then all the results are written.
+For exmaple, this packet performs a swap of registers r0 and r1
+ { r0 = r1; r1 = r0 }
+Note that the result is different if the instructions are executed serially.
+
+Packet semantics dictate that we defer any changes of state until the entire
+packet is committed. We record the results of each instruction in a side data
+structure, and update the visible processor state when we commit the packet.
+
+The data structures are divided between the runtime state and the translation
+context.
+
+During the TCG generation (see translate.[ch]), we use the DisasContext to
+track what needs to be done during packet commit. Here are the relevant
+fields
+
+ reg_log list of registers written
+ reg_log_idx index into ctx_reg_log
+ pred_log list of predicates written
+ pred_log_idx index into ctx_pred_log
+ store_width width of stores (indexed by slot)
+
+During runtime, the following fields in CPUHexagonState (see cpu.h) are used
+
+ new_value new value of a given register
+ reg_written boolean indicating if register was written
+ new_pred_value new value of a predicate register
+ pred_written boolean indicating if predicate was written
+ mem_log_stores record of the stores (indexed by slot)
+
+*** Debugging ***
+
+You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in
+internal.h. This will stream a lot of information as it generates TCG and
+executes the code.
+
+To track down nasty issues with Hexagon->TCG generation, we compare the
+execution results with actual hardware running on a Hexagon Linux target.
+Run qemu with the "-d cpu" option. Then, we can diff the results and figure
+out where qemu and hardware behave differently.
+
+The stacks are located at different locations. We handle this by changing
+env->stack_adjust in translate.c. First, set this to zero and run qemu.
+Then, change env->stack_adjust to the difference between the two stack
+locations. Then rebuild qemu and run again. That will produce a very
+clean diff.
+
+Here are some handy places to set breakpoints
+
+ At the call to gen_start_packet for a given PC (note that the line number
+ might change in the future)
+ br translate.c:602 if ctx->base.pc_next == 0xdeadbeef
+ The helper function for each instruction is named helper_<TAG>, so here's
+ an example that will set a breakpoint at the start
+ br helper_A2_add
+ If you have the HEX_DEBUG macro set, the following will be useful
+ At the start of execution of a packet for a given PC
+ br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef
+ At the end of execution of a packet for a given PC
+ br helper_debug_commit_end if env->this_PC == 0xdeadbeef
+
--
2.7.4
- [RFC PATCH v3 15/34] Hexagon (target/hexagon) instruction printing, (continued)
- [RFC PATCH v3 15/34] Hexagon (target/hexagon) instruction printing, Taylor Simpson, 2020/08/18
- [RFC PATCH v3 05/34] Hexagon (target/hexagon) register names, Taylor Simpson, 2020/08/18
- [RFC PATCH v3 06/34] Hexagon (disas) disassembler, Taylor Simpson, 2020/08/18
- [RFC PATCH v3 11/34] Hexagon (target/hexagon) register fields, Taylor Simpson, 2020/08/18
- [RFC PATCH v3 02/34] Hexagon (target/hexagon) README,
Taylor Simpson <=
- [RFC PATCH v3 03/34] Hexagon (include/elf.h) ELF machine definition, Taylor Simpson, 2020/08/18
- [RFC PATCH v3 13/34] Hexagon (target/hexagon) register map, Taylor Simpson, 2020/08/18
- [RFC PATCH v3 14/34] Hexagon (target/hexagon) instruction/packet decode, Taylor Simpson, 2020/08/18
- [RFC PATCH v3 07/34] Hexagon (target/hexagon) scalar core helpers, Taylor Simpson, 2020/08/18