qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 2/6] Add copy and constant propagation.


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v2 2/6] Add copy and constant propagation.
Date: Fri, 10 Jun 2011 10:44:50 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc15 Thunderbird/3.1.10

On 06/09/2011 03:45 AM, Kirill Batuzov wrote:
> Make tcg_constant_folding do copy and constant propagation. It is a
> preparational work before actual constant folding.
> 
> Signed-off-by: Kirill Batuzov <address@hidden>
> ---
>  tcg/optimize.c |  161 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 161 insertions(+), 0 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 5daf131..7996798 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -31,23 +31,178 @@
>  #include "qemu-common.h"
>  #include "tcg-op.h"
>  
> +typedef enum {
> +    TCG_TEMP_UNDEF = 0,
> +    TCG_TEMP_CONST,
> +    TCG_TEMP_COPY,
> +    TCG_TEMP_ANY
> +} tcg_temp_state;

Coding conventions request StudlyCaps, fwiw.

> +
> +struct tcg_temp_info {
> +    tcg_temp_state state;
> +    uint16_t num_copies;
> +    tcg_target_ulong val;
> +};
> +
> +/* Reset TEMP's state to TCG_TEMP_ANY.  If TEMP was a representative of some
> +   class of equivalent temp's, a new representative should be chosen in this
> +   class. */
> +static void reset_temp(struct tcg_temp_info *temps, TCGArg temp, int 
> nb_temps,
> +                       int nb_globals)
> +{
> +    int i;
> +    TCGArg new_base;
> +    if (temps[temp].state == TCG_TEMP_COPY) {
> +        temps[temps[temp].val].num_copies--;
> +    }
> +    if (temps[temp].num_copies > 0) {
> +        new_base = (TCGArg)-1;
> +        for (i = nb_globals; i < nb_temps; i++) {

I think it's probably better to place the elements of the equiv class into
a double-linked circular list, rather than a mere num_copies that forces
you to iterate over nb_temps to remove an element.  E.g.

  struct tcg_temp_info {
    tcg_temp_state state;
    uint16_t prev_copy;
    uint16_t next_copy;
    tcg_target_ulong val;
  };

This makes elements easy to remove, and easy to choose a new representative.

> +static int op_bits(int op)
> +{
> +    switch (op) {
> +    case INDEX_op_mov_i32:
> +        return 32;
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_mov_i64:
> +        return 64;
> +#endif
> +    default:
> +        fprintf(stderr, "Unrecognized operation %d in op_bits.\n", op);
> +        tcg_abort();
> +    }
> +}

I think we would be well-served to move this to a property of the opcode,
much the same way as TCG_OPF_CALL_CLOBBER et al.  I would not, of course,
suggest to block this patch series on that cleanup.

> +static int op_to_movi(int op)
> +{
> +    if (op_bits(op) == 32) {
> +        return INDEX_op_movi_i32;
> +    }
> +#if TCG_TARGET_REG_BITS == 64
> +    if (op_bits(op) == 64) {
> +        return INDEX_op_movi_i64;
> +    }
> +#endif
> +    tcg_abort();
> +}

This should use a switch, not two calls to a function.

> +    struct tcg_temp_info temps[TCG_MAX_TEMPS];

Given that gen_opc_buf is static, I see no reason not to make this a
static variable as well.  Put it at file scope so you don't have to
pass it as arguments to subroutines.

> +        /* Do copy propagation */
> +        if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS))) {

Why are you suppressing copy propagation in this way?  I see no reason for it.

> +            assert(op != INDEX_op_call);
> +            for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
> +                if (temps[args[i]].state == TCG_TEMP_COPY
> +                    && !(def->args_ct[i].ct & TCG_CT_IALIAS)

>From looking at only ARM translator output, you might suppose that we've
already performed matching constraint substitution.  But you'd be wrong.

I realize that at present you get a smidge smaller code by suppressing
substitution around matching constraints, but that's only because our
copy propagation is incomplete.  From a pure theoretic stance, I think
this line is wrong.  E.g.

 With IALIAS suppression                 Without
 ---- 0x40802e50                         ---- 0x40802e50
 mov_i32 tmp5,r10                    |   nopn $0x2,$0x2
 movi_i32 tmp6,$0x40802e58               movi_i32 tmp6,$0x40802e58
 add_i32 tmp6,tmp5,tmp6              |   add_i32 tmp6,r10,tmp6
 mov_i32 r10,tmp6                        mov_i32 r10,tmp6

I'll claim that that the right-hand column is better, even though it
currently forces the generation of an LEA insn instead of an ADD.

What's needed to fix this is either (1) a much better register allocator
or (2) a reverse copy-propagation pass that pushes global variables up
into outputs containing temporaries.

> +                    && (def->args_ct[i].ct & TCG_CT_REG)) {

This line looks redundant.  Don't we assert this property
in tcg_add_target_add_op_defs?  Certainly elsewhere we expect all
arguments to be able to accept a register...

> +        case INDEX_op_mov_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_mov_i64:
> +#endif

I suggest we take a page from tcg/sparc/tcg-target.c:

#if TCG_TARGET_REG_BITS == 64
#define CASE_OP_32_64(x)                        \
        glue(glue(case INDEX_op_, x), _i32):    \
        glue(glue(case INDEX_op_, x), _i64)
#else
#define CASE_OP_32_64(x)                        \
        glue(glue(case INDEX_op_, x), _i32)
#endif

Used as

        CASE_OP_32_64(mov):

in order to avoid rampant ifdefs like this.

Unfortunately that also raises the spectre of the fact that almost
all of the logical operations are optional.

Again, probably not something to tackle within this patch series,
but I suggest that we no longer conditionally define the opcodes.
Unconditionally define them, but mark them with TCG_OPF_UNSUPPORTED,
do not generate them, and abort in final code generation if they're
seen.  I think that would clean up a lot of if-deffery in and 
around tcg/*.c.

>          case INDEX_op_call:
> +            for (i = 0; i < nb_globals; i++) {
> +                reset_temp(temps, i, nb_temps, nb_globals);
> +            }

No need to reset globals if the call is PURE or CONST.

>          case INDEX_op_brcond_i64:
>  #endif
> +            memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));

Perhaps to pedantic, but we can do better in extended basic blocks.
A full flush of all temps is only needed at labels.  



r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]