qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [4581] update


From: Fabrice Bellard
Subject: [Qemu-devel] [4581] update
Date: Sun, 25 May 2008 18:24:40 +0000

Revision: 4581
          http://svn.sv.gnu.org/viewvc/?view=rev&root=qemu&revision=4581
Author:   bellard
Date:     2008-05-25 18:24:40 +0000 (Sun, 25 May 2008)

Log Message:
-----------
update

Modified Paths:
--------------
    trunk/tcg/README
    trunk/tcg/TODO

Modified: trunk/tcg/README
===================================================================
--- trunk/tcg/README    2008-05-25 18:21:31 UTC (rev 4580)
+++ trunk/tcg/README    2008-05-25 18:24:40 UTC (rev 4581)
@@ -16,15 +16,19 @@
 
 A TCG "function" corresponds to a QEMU Translated Block (TB).
 
-A TCG "temporary" is a variable only live in a given
-function. Temporaries are allocated explicitly in each function.
+A TCG "temporary" is a variable only live in a basic
+block. Temporaries are allocated explicitly in each function.
 
-A TCG "global" is a variable which is live in all the functions. They
-are defined before the functions defined. A TCG global can be a memory
-location (e.g. a QEMU CPU register), a fixed host register (e.g. the
-QEMU CPU state pointer) or a memory location which is stored in a
-register outside QEMU TBs (not implemented yet).
+A TCG "local temporary" is a variable only live in a function. Local
+temporaries are allocated explicitly in each function.
 
+A TCG "global" is a variable which is live in all the functions
+(equivalent of a C global variable). They are defined before the
+functions defined. A TCG global can be a memory location (e.g. a QEMU
+CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
+or a memory location which is stored in a register outside QEMU TBs
+(not implemented yet).
+
 A TCG "basic block" corresponds to a list of instructions terminated
 by a branch instruction. 
 
@@ -32,11 +36,11 @@
 
 3.1) Introduction
 
-TCG instructions operate on variables which are temporaries or
-globals. TCG instructions and variables are strongly typed. Two types
-are supported: 32 bit integers and 64 bit integers. Pointers are
-defined as an alias to 32 bit or 64 bit integers depending on the TCG
-target word size.
+TCG instructions operate on variables which are temporaries, local
+temporaries or globals. TCG instructions and variables are strongly
+typed. Two types are supported: 32 bit integers and 64 bit
+integers. Pointers are defined as an alias to 32 bit or 64 bit
+integers depending on the TCG target word size.
 
 Each instruction has a fixed number of output variable operands, input
 variable operands and always constant operands.
@@ -44,14 +48,12 @@
 The notable exception is the call instruction which has a variable
 number of outputs and inputs.
 
-In the textual form, output operands come first, followed by input
-operands, followed by constant operands. The output type is included
-in the instruction name. Constants are prefixed with a '$'.
+In the textual form, output operands usually come first, followed by
+input operands, followed by constant operands. The output type is
+included in the instruction name. Constants are prefixed with a '$'.
 
 add_i32 t0, t1, t2  (t0 <- t1 + t2)
 
-sub_i64 t2, t3, $4  (t2 <- t3 - 4)
-
 3.2) Assumptions
 
 * Basic blocks
@@ -62,9 +64,8 @@
 - Basic blocks start after the end of a previous basic block, at a
   set_label instruction or after a legacy dyngen operation.
 
-After the end of a basic block, temporaries at destroyed and globals
-are stored at their initial storage (register or memory place
-depending on their declarations).
+After the end of a basic block, the content of temporaries is
+destroyed, but local temporaries and globals are preserved.
 
 * Floating point types are not supported yet
 
@@ -100,7 +101,7 @@
   is suppressed.
 
 - A liveness analysis is done at the basic block level. The
-  information is used to suppress moves from a dead temporary to
+  information is used to suppress moves from a dead variable to
   another one. It is also used to remove instructions which compute
   dead results. The later is especially useful for condition code
   optimization in QEMU.
@@ -113,47 +114,6 @@
 
   only the last instruction is kept.
 
-- A macro system is supported (may get closer to function inlining
-  some day). It is useful if the liveness analysis is likely to prove
-  that some results of a computation are indeed not useful. With the
-  macro system, the user can provide several alternative
-  implementations which are used depending on the used results. It is
-  especially useful for condition code optimization in QEMU.
-
-  Here is an example:
-
-  macro_2 t0, t1, $1
-  mov_i32 t0, $0x1234
-
-  The macro identified by the ID "$1" normally returns the values t0
-  and t1. Suppose its implementation is:
-
-  macro_start
-  brcond_i32  t2, $0, $TCG_COND_EQ, $1
-  mov_i32 t0, $2
-  br $2
-  set_label $1
-  mov_i32 t0, $3
-  set_label $2
-  add_i32 t1, t3, t4
-  macro_end
-  
-  If t0 is not used after the macro, the user can provide a simpler
-  implementation:
-
-  macro_start
-  add_i32 t1, t2, t4
-  macro_end
-
-  TCG automatically chooses the right implementation depending on
-  which macro outputs are used after it.
-
-  Note that if TCG did more expensive optimizations, macros would be
-  less useful. In the previous example a macro is useful because the
-  liveness analysis is done on each basic block separately. Hence TCG
-  cannot remove the code computing 't0' even if it is not used after
-  the first macro implementation.
-
 3.4) Instruction Reference
 
 ********* Function call
@@ -241,6 +201,10 @@
 
 t0=t1^t2
 
+* not_i32/i64 t0, t1
+
+t0=~t1
+
 ********* Shifts
 
 * shl_i32/i64 t0, t1, t2
@@ -428,3 +392,34 @@
 the generated code.
 
 The exception model is the same as the dyngen one.
+
+6) Recommended coding rules for best performance
+
+- Use globals to represent the parts of the QEMU CPU state which are
+  often modified, e.g. the integer registers and the condition
+  codes. TCG will be able to use host registers to store them.
+
+- Avoid globals stored in fixed registers. They must be used only to
+  store the pointer to the CPU state and possibly to store a pointer
+  to a register window. The other uses are to ensure backward
+  compatibility with dyngen during the porting a new target to TCG.
+
+- Use temporaries. Use local temporaries only when really needed,
+  e.g. when you need to use a value after a jump. Local temporaries
+  introduce a performance hit in the current TCG implementation: their
+  content is saved to memory at end of each basic block.
+
+- Free temporaries and local temporaries when they are no longer used
+  (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
+  should free it after it is used. Freeing temporaries does not yield
+  a better generated code, but it reduces the memory usage of TCG and
+  the speed of the translation.
+
+- Don't hesitate to use helpers for complicated or seldom used target
+  intructions. There is little performance advantage in using TCG to
+  implement target instructions taking more than about twenty TCG
+  instructions.
+
+- Use the 'discard' instruction if you know that TCG won't be able to
+  prove that a given global is "dead" at a given program point. The
+  x86 target uses it to improve the condition codes optimisation.

Modified: trunk/tcg/TODO
===================================================================
--- trunk/tcg/TODO      2008-05-25 18:21:31 UTC (rev 4580)
+++ trunk/tcg/TODO      2008-05-25 18:24:40 UTC (rev 4581)
@@ -1,32 +1,15 @@
-- test macro system
+- Add new instructions such as: andnot, ror, rol, setcond, clz, ctz,
+  popcnt.
 
-- test conditional jumps
+- See if it is worth exporting mul2, mulu2, div2, divu2. 
 
-- test mul, div, ext8s, ext16s, bswap
+- Support of globals saved in fixed registers between TBs.
 
-- generate a global TB prologue and epilogue to save/restore registers
-  to/from the CPU state and to reserve a stack frame to optimize
-  helper calls. Modify cpu-exec.c so that it does not use global
-  register variables (except maybe for 'env').
-
-- fully convert the x86 target. The minimal amount of work includes:
-  - add cc_src, cc_dst and cc_op as globals
-  - disable its eflags optimization (the liveness analysis should
-    suffice)
-  - move complicated operations to helpers (in particular FPU, SSE, MMX).
-
-- optimize the x86 target:
-  - move some or all the registers as globals
-  - use the TB prologue and epilogue to have QEMU target registers in
-    pre assigned host registers.
-
 Ideas:
 
 - Move the slow part of the qemu_ld/st ops after the end of the TB.
 
-- Experiment: change instruction storage to simplify macro handling
-  and to handle dynamic allocation and see if the translation speed is
-  OK.
-
-- change exception syntax to get closer to QOP system (exception
+- Change exception syntax to get closer to QOP system (exception
   parameters given with a specific instruction).
+
+- Add float and vector support.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]