qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access fun


From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access function
Date: Tue, 22 May 2012 08:56:12 +1000

On Tue, 2012-05-22 at 01:22 +0300, Michael S. Tsirkin wrote:
> > Again, from which originator ? From a given initiator, nothing
> bypasses
> > anything, so the right thing to do here is a full mb(). However, I
> > suspect what you are talking about here is read -responses- not
> > bypassing writes in the direction of the response (ie, the
> "flushing"
> > semantic of reads) which is a different matter.
> 
> No. My spec says:
> A3, A4
> A Posted Request must be able to pass Non-Posted Requests to avoid
> deadlocks.

Right, a read + write can become write + read at the target, I forgot
about that, or you can deadlock due to the flush semantics, but a write
+ read must remain in order or am I missing something ?

And write + read afaik is typically the one that x86 can re-order
without a barrier isn't it ?

> > Also don't forget that
> > this is only a semantic of PCI, not of the system fabric, ie, a
> device
> > DMA read doesn't flush a CPU write that is still in that CPU store
> > queue.
> 
> We need to emulate what hardware does IMO.
> So if usb has different rules it needs a different barriers.

Who talks about USB here ? Whatever rules USB has only matter between
the USB device and the USB controller emulation, USB doesn't sit
directly on the memory bus doing DMA, it all goes through the HCI, which
adheres the the ordering rules of whatever bus it sits on.

Here, sanity must prevail :-) I suggest ordering by default...

> > And it's not correct. With that setup, DMA writes can pass DMA reads
> > (and vice-versa) which doesn't correspond to the guarantees of the
> PCI
> > spec.
> 
> Cite the spec please. Express spec matches this at least.

Sure, see above. Yes I did forgot that a read + write could be
re-ordered on PCI but that isn't the case of a write + read, or am I
reading the table sideways ?

> It's a lot of work. We measured the effect for virtio in
> the past. I don't think we need to redo it.

virtio is specifically our high performance case and what I'm proposing
isn't affecting it.

> > > You said above x86 is unaffected. This is portability, not safety.
> > 
> > x86 is unaffected by the missing wmb/rmb, it might not be unaffected
> by
> > the missing ordering between loads and stores, I don't know, as I
> said,
> > I don't fully know the x86 memory model.
> 
> You don't need to understand it. Assume memory-barriers.h is correct.

In which case we still need a full mb() unless we can convince ourselves
that the ordering between a write and a subsequent read can be relaxed
safely and I'm really not sure about it.

> > In any case, opposing "portability" to "safety" the way you do it
> means
> > you are making assumptions that basically "qemu is written for x86
> and
> > nothing else matters".
> 
> No. But find a way to fix power without hurting working setups.

And ARM ;-)

Arguably x86 is wrong too anyway, at least from a strict interpretation
off the spec (and unless I missed something).

> > If that's your point of view, so be it and be clear about it, but I
> will
> > disagree :-) And while I can understand that powerpc might not be
> > considered as the most important arch around at this point in time,
> > these problems are going to affect ARM as well.
> > 
> > > > Almost all our
> > > > devices were written without any thought given to ordering, so
> they
> > > > basically can and should be considered as all broken.
> > > 
> > > Problem is, a lot of code is likely broken even after you sprinkle
> > > barriers around. For example qemu might write A then B where guest
> driver
> > > expects to see B written before A.
> > 
> > No, this is totally unrelated bugs, nothing to do with barriers. You
> are
> > mixing up two completely different problems and using one as an
> excuse
> > to not fix the other one :-)
> > 
> > A device with the above problem would be broken today on x86
> regardless.
> > 
> > > > Since thinking
> > > > about ordering is something that, by experience, very few
> programmer can
> > > > do and get right, the default should absolutely be fully
> ordered.
> > > 
> > > Give it bus ordering. That is not fully ordered.
> > 
> > It pretty much is actually, look at your PCI spec :-)
> 
> I looked. 2.4.1.  Transaction Ordering Rules
> 
> > > > Performance regressions aren't a big deal to bisect in that
> case: If
> > > > there's a regression for a given driver and it points to this
> specific
> > > > patch adding the barriers then we know precisely where the
> regression
> > > > come from, and we can get some insight about how this specific
> driver
> > > > can be improved to use more relaxed accessors.
> > > > 
> > > > I don't see the problem.
> > > > 
> > > > One thing that might be worth looking at is if indeed mb() is so
> much
> > > > more costly than just wmb/rmb, in which circumstances we could
> have some
> > > > smarts in the accessors to make them skip the full mb based on
> knowledge
> > > > of previous access direction, though here too I would be tempted
> to only
> > > > do that if absolutely necessary (ie if we can't instead just fix
> the
> > > > sensitive driver to use explicitly relaxed accessors).
> > > 
> > > We did this in virtio and yes it is measureable.
> > 
> > You did it in virtio in a very hot spot on a performance critical
> > driver. My argument is that:
> > 
> >  - We can do it in a way that doesn't affect virtio at all (by using
> the
> > dma accessors instead of cpu_*)
> > 
> >  - Only few drivers have that kind of performance criticality and
> they
> > can be easily hand fixed.
> > 
> > > branches are pretty cheap though.
> > 
> > Depends, not always but yes, cheaper than barriers in many cases.
> > 
> > > > > > So on that I will not compromise.
> > > > > > 
> > > > > > However, I think it might be better to leave the barrier in
> the dma
> > > > > > accessor since that's how you also get iommu transparency
> etc... so it's
> > > > > > not a bad place to put them, and leave the cpu_physical_*
> for use by
> > > > > > lower level device drivers which are thus responsible also
> for dealing
> > > > > > with ordering if they have to.
> > > > > > 
> > > > > > Cheers,
> > > > > > Ben.
> > > > > 
> > > > > You claim to understand what matters for all devices I doubt
> that.
> > > > 
> > > > It's pretty obvious that anything that does DMA using a classic
> > > > descriptor + buffers structure is broken without appropriate
> ordering.
> > > > 
> > > > And yes, I claim to have a fairly good idea of the problem, but
> I don't
> > > > think throwing credentials around is going to be helpful.
> > > >  
> > > > > Why don't we add safe APIs, then go over devices and switch
> over?
> > > > > I counted 97 pci_dma_ accesses.
> > > > > 33 in rtl, 32 in eepro100, 12 in lsi, 7 in e1000.
> > > > > 
> > > > > Let maintainers make a decision where does speed matter.
> > > > 
> > > > No. Let's fix the general bug first. Then let's people who know
> the
> > > > individual drivers intimately and understand their access
> patterns make
> > > > the call as to when things can/should be relaxed.
> > > > 
> > > > Ben.
> > > 
> > > As a maintainer of a device, if you send me a patch I can review.
> > > If you change core APIs creating performance regressions
> > > I don't even know what to review without wasting time debugging
> > > and bisecting.
> > 
> > Well, if you don't know then you ask on the list and others (such as
> > myself or a certain mst) who happens to know those issues will help
> that
> > lone clueless maintainer, seriously it's not like it's hard, and a
> lot
> > easier than just keeping everything broken and hoping we get to
> audit
> > them all properly.
> > 
> > > According to what you said you want to fix kvm on powerpc.
> > > Good. Find a way that looks non intrusive on x86 please.
> > 
> > And ARM. And any other OO arch.
> > 
> > Ben.
> 
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]