[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH] tcg: Optimize fence instructions
From: |
Pranith Kumar |
Subject: |
Re: [Qemu-devel] [RFC PATCH] tcg: Optimize fence instructions |
Date: |
Tue, 19 Jul 2016 14:28:03 -0400 |
Richard Henderson writes:
> On 07/15/2016 01:59 AM, Pranith Kumar wrote:
>> This patch applies on top of the fence generation patch series.
>>
>> This commit optimizes fence instructions. Two optimizations are
>> currently implemented. These are:
>>
>> 1. Unnecessary duplicate fence instructions
>>
>> If the same fence instruction is detected consecutively, we remove
>> one instance of it.
>>
>> ex: mb; mb => mb, strl; strl => strl
>>
>> 2. Merging weaker fence with subsequent/previous stronger fence
>>
>> load-acquire/store-release fence can be combined with a full fence
>> without relaxing the ordering constraint.
>>
>> ex: a) ld; ldaq; mb => ld; mb
>> b) mb; strl; st => mb; st
>>
>> Signed-off-by: Pranith Kumar <address@hidden>
>> ---
>> tcg/optimize.c | 59
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> tcg/tcg.h | 1 +
>> 2 files changed, 60 insertions(+)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index c0d975b..a655829 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>> @@ -569,6 +569,63 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
>> return false;
>> }
>>
>> +/* Eliminate duplicate and unnecessary fence instructions */
>> +void tcg_optimize_mb(TCGContext *s)
>> +{
>> + int oi, oi_next;
>> + TCGArg prev_op_mb = -1;
>> + TCGOp *prev_op;
>> +
>> + for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
>> + TCGOp *op = &s->gen_op_buf[oi];
>> + TCGArg *args = &s->gen_opparam_buf[op->args];
>> + TCGOpcode opc = op->opc;
>> +
>> + switch (opc) {
>> + case INDEX_op_mb:
>> + {
>> + TCGBar curr_mb_type = args[0] & 0xF0;
>> + TCGBar prev_mb_type = prev_op_mb & 0xF0;
>
> No check to make sure that prev_op_mb != -1?
>
I don't think that is necessary since if prev_op_mb is -1 then prev_mb_type
will be 0x70 which will not match any of the cases we compare against below.
>> +
>> + if (curr_mb_type == prev_mb_type ||
>> + (curr_mb_type == TCG_BAR_STRL && prev_mb_type ==
>> TCG_BAR_SC)) {
>> + /* Remove the current weaker barrier op. The previous
>> + * barrier is stronger and sufficient.
>> + * mb; strl => mb; st
>> + */
>> + tcg_op_remove(s, op);
>> + } else if (curr_mb_type == TCG_BAR_SC &&
>> + prev_mb_type == TCG_BAR_LDAQ) {
>> + /* Remove the previous weaker barrier op. The current
>> + * barrier is stronger and sufficient.
>> + * ldaq; mb => ld; mb
>> + */
>> + tcg_op_remove(s, prev_op);
>> + } else if (curr_mb_type == TCG_BAR_STRL &&
>> + prev_mb_type == TCG_BAR_LDAQ) {
>> + /* Consecutive load-acquire and store-release barriers
>> + * can be merged into one stronger SC barrier
>> + * ldaq; strl => ld; mb; st
>> + */
>> + args[0] = (args[0] & 0x0F) | TCG_BAR_SC;
>> + tcg_op_remove(s, prev_op);
>
> Don't you need to merge the bottom 4 bits as well?
>
That is an interesting point. Ideally, the TCG_MO_* flags for both the
barriers should be the same. Atleast now, for LDAR and STRL barriers this flag
is TCG_MO_ALL since we do not have any other variant.
>> + default:
>> + prev_op_mb = -1;
>> + prev_op = NULL;
>> + break;
>
> A reasonable start, I suppose, but there's nothing stopping you from
> considering barriers that aren't exactly adjacent.
>
> You only need to reset prev_op_mb when you see qemu_ld, qemu_st, call, or
> tcg_op_defs[opc].flags & TCG_OPF_BB_END.
>
That sounds like a good idea to extend this to. I will work on it.
Thanks!
--
Pranith