qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x


From: Sergey Fedorov
Subject: Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
Date: Wed, 22 Jun 2016 14:18:52 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0

On 21/06/16 21:03, Pranith Kumar wrote:
> On Tue, Jun 21, 2016 at 1:54 PM, Peter Maydell <address@hidden> wrote:
>> On 21 June 2016 at 18:28, Pranith Kumar <address@hidden> wrote:
>>> Reg. the second point, I did consider this situation of running x86 on
>>> ARM where such barriers are necessary for correctness. But, I am
>>> really apprehensive of the cost it will impose. I am not sure if there
>>> are any alternative solutions to avoid generating barriers for each
>>> memory operation, but it would be great if we could reduce them.
>> I vaguely recall an idea that you could avoid needing
>> explicit barriers by turning all the guest load/stores into
>> host load-acquire/store-release, but I have no idea whether
>> that's (a) actually true (b) any better than piles of
>> explicit barriers.
> Yes, this is true for ARMv8(not sure about ia64). The
> load-acquire/store-release operations are sequentially consistent to
> each other. But this does not work for ARMv7 and as you said... I
> think the cost here too is really prohibitive.

As I understand, there's no requirement for sequential consistency even
on a systems with pretty strong memory model such as x86. Due to the
presence of store queue, earlier regular stores are allowed to be
completed after the following regular loads. This relaxation breaks
sequential consistency requirement, if I understand correctly, since it
allows a CPU to see its own stores with respect to other CPU stores in
different order. However, such a model can perfectly match
acquire/release semantics, even as it is defined by Itanium memory
model. Lets break it down:
(1) if a load-acquire must not be reordered with any subsequent loads
and stores,
(2) and if a store-release must not be reordered with any preceding
loads and stores,
(3) thus if all loads are load-acquires and all stores are
store-releases, then the only possible reordering can be a store-release
reordered after the subsequent load-acquire.

Considering this, I think that strongly-ordered memory model semantics
such (as in x86 memory model) can be translated directly into relaxed
acquire/release memory model semantics (as in Itanium memory model or a
bit more strong ARMv8). And I believe this can perform better than
inserting separate memory barriers on those architectures which provide
acquire/release semantics since it is more relaxed and permit certain
hardware optimizations like store-after-load reordering.

Kind regards,
Sergey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]