qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] tcg: Optimize fence instructions


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC PATCH] tcg: Optimize fence instructions
Date: Tue, 19 Jul 2016 19:16:07 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1


On 14/07/2016 22:29, Pranith Kumar wrote:
> +            } else if (curr_mb_type == TCG_BAR_STRL &&
> +                       prev_mb_type == TCG_BAR_LDAQ) {
> +                /* Consecutive load-acquire and store-release barriers
> +                 * can be merged into one stronger SC barrier
> +                 * ldaq; strl => ld; mb; st
> +                 */
> +                args[0] = (args[0] & 0x0F) | TCG_BAR_SC;
> +                tcg_op_remove(s, prev_op);

Is this really an optimization?  For example the processor could reorder
"st1; ldaq1; strl2; ld2" to "ldaq1; ld2; st1; strl2".  It cannot do this
if you change ldaq1/strl2 to ld1/mb/st2.

On x86 for example a memory fence costs ~50 clock cycles, while normal
loads and stores are of course faster.

Of course this is useful if your target doesn't have ldaq/strl
instructions.  In this case, however, you probably want to lower ldaq to
"ld;mb" and strl to "mb;st"; the other optimizations then will remove
the unnecessary barrier.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]