[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v2 PATCH] tcg: Optimize fence instructions
From: |
Pranith Kumar |
Subject: |
Re: [Qemu-devel] [RFC v2 PATCH] tcg: Optimize fence instructions |
Date: |
Wed, 10 Aug 2016 13:58:29 -0400 |
Hi Richard,
On Wed, Aug 10, 2016 at 6:44 AM, Richard Henderson <address@hidden> wrote:
> On 08/10/2016 04:42 AM, Pranith Kumar wrote:
>>
>> +/* Eliminate duplicate and unnecessary fence instructions */
>> +void tcg_optimize_mb(TCGContext *s)
>> +{
>> + int oi, oi_next;
>> + TCGArg prev_op_mb = -1;
>> + TCGOp *prev_op = NULL;
>> +
>> + for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
>> + TCGOp *op = &s->gen_op_buf[oi];
>> + TCGArg *args = &s->gen_opparam_buf[op->args];
>> + TCGOpcode opc = op->opc;
>> +
>> + switch (opc) {
>> + case INDEX_op_mb:
>> + {
>> + TCGBar curr_mb_type = args[0] & 0xF0;
>> + TCGBar prev_mb_type = prev_op_mb & 0xF0;
>> +
>> + if (curr_mb_type == prev_mb_type ||
>> + (curr_mb_type == TCG_BAR_STRL && prev_mb_type ==
>> TCG_BAR_SC)) {
>
>
> As I said before, I don't think you should be doing any of this if
> prev_op_mb == -1. I think that's the first thing you should test within
> INDEX_op_mb.
If prev_op_mb is -1, none of the conditions evaluate to true and we
will fall through. Did you want me to check this as an optimization or
am I missing your point?
>
>> + /* Remove the current weaker barrier op. The previous
>> + * barrier is stronger and sufficient.
>> + * mb; strl => mb; st
>> + */
>> + tcg_op_remove(s, op);
>> + op = prev_op;
>
>
> As I said before, surely you have to merge the low 4 bits as well.
Since we are removing the weaker barrier and leaving the stronger
barrier, I think just updating args[0] to have the stronger barrier
semantics should be enough.
tcg_op_remove(s, op);
op = prev_op;
args[0] = prev_op_mb;
break;
Is there something I am missing when you say merge?
>
>> + case INDEX_op_ld8u_i32:
>> + case INDEX_op_ld8u_i64:
>> + case INDEX_op_ld8s_i32:
>> + case INDEX_op_ld8s_i64:
>> + case INDEX_op_ld16u_i32:
>> + case INDEX_op_ld16u_i64:
>> + case INDEX_op_ld16s_i32:
>> + case INDEX_op_ld16s_i64:
>> + case INDEX_op_ld_i32:
>> + case INDEX_op_ld32u_i64:
>> + case INDEX_op_ld32s_i64:
>> + case INDEX_op_ld_i64:
>> + case INDEX_op_st8_i32:
>> + case INDEX_op_st8_i64:
>> + case INDEX_op_st16_i32:
>> + case INDEX_op_st16_i64:
>> + case INDEX_op_st_i32:
>> + case INDEX_op_st32_i64:
>> + case INDEX_op_st_i64:
>
>
> None of these are guest memory operations. These are host access to the ENV
> structure, primarily for cpu registers that aren't common enough to warrant
> a TCG temporary.
>
> You wanted to be checking for
>
> INDEX_op_qemu_ld_i32
> INDEX_op_qemu_ld_i64
> INDEX_op_qemu_st_i32
> INDEX_op_qemu_st_i64
>
> which are the guest memory operations.
>
Yes, I got confused. I will fix this.
>> + case INDEX_op_call:
>> + prev_op_mb = -1;
>> + prev_op = NULL;
>> + break;
>
>
> Probably you could skip the calls to "pure" helpers. I.e. those with
> TCG_CALL_NO_RWG_SE. But it's probably not that important and we can leave
> it for another day.
>
>
> r~
--
Pranith