qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW fun


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function
Date: Fri, 18 May 2012 08:53:49 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

Il 17/05/2012 02:24, Benjamin Herrenschmidt ha scritto:
>> > Also, should I make the barrier conditional to kvm_enabled() ? IE. It's
>> > pointless in full emulation and might actually be a performance hit on
>> > something already quite slow...
> Finally ... something like smp_mb() in qemu will turn into a lock op or
> an mfence on x86, ie not a nop.
> 
> That means overhead from today's implementation, which leads to the
> question ... is today implementation correct ? IE. Is a barrier needed
> on x86 as well or not ?

It depends on what semantics you attach to dma_mb.  In my opinion,
having a separate barrier for DMA is wrong, because you want the same
semantics on all architectures.

The x86 requirements are roughly as follows:

1) it never needs explicit rmb and wmb (as long as you don't use
non-temporal stores etc.);

2) synchronized operations have an implicit mb before and after (unlike
LL/SC on PowerPC).

3) everywhere else, you need an mb.

So, on x86 you have more or less an implicit wmb after each write and an
implicit rmb before a read.  This explains why these kind of bugs are
very hard to see on x86 (or often impossible to see).  Adding these in
cpu_physical_memory_rw has the advantage that x86 performance is not
affected, but it would not cover the case of a device model doing a DMA
read after a DMA write.  Then the device model would need to issue a
smp_mb manually, on all architectures.  I think this is too brittle.

> If not (I'm trying to figure out why exactly does x86 have a barrier in
> the first place and when it's in order), then I might add a new barrier
> type in qemu-barriers.h, something like dma_mb(), and define it as a nop
> on x86, a lwsync or sync (still thinking about it) on ppc, and
> __sync_synchronize() on unknown archs.

I don't think it is correct to think of this in terms of low-level
operations such as sync/lwsync.  Rather, I think what we want is
sequentially consistent accesses; it's heavyweight, but you cannot go
wrong.  http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html says this
is how you do it:

x86 -> Load Seq_Cst:    mov                or mfence; mov
       Store Seq Cst:   mov; mfence        or mov

ARM -> Load Seq Cst:    ldr; dmb           or dmb; ldr; dmb
       Store Seq Cst:   dmb; str; dmb      or dmb; str

PPC -> Load Seq Cst:    sync; ld; cmp; bc; isync
       Store Seq Cst:   sync; st

       where cmp; bc; isync can be replaced by sync.

and says "As far as the memory model is concerned, the ARM processor is
broadly similar to PowerPC, differing mainly in having a DMB barrier
(analogous to the PowerPC sync in its programmer-observable behaviour
for normal memory) and no analogue of the PowerPC lwsync".

So one of the two ARM mappings, with smp_mb instead of dmb, is what we
want in cpu_physical_memory_rw.  Device models that want to do better
can just use cpu_physical_memory_map, or we can add a
cpu_physical_memory_rw_direct for them.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]