qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] virtio: Make memory barriers be memory barriers


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH] virtio: Make memory barriers be memory barriers
Date: Mon, 5 Sep 2011 12:19:46 +0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Sep 05, 2011 at 02:43:16PM +1000, David Gibson wrote:
> On Sun, Sep 04, 2011 at 12:16:43PM +0300, Michael S. Tsirkin wrote:
> > On Sun, Sep 04, 2011 at 12:46:35AM +1000, David Gibson wrote:
> > > On Fri, Sep 02, 2011 at 06:45:50PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 01, 2011 at 04:31:09PM -0400, Paolo Bonzini wrote:
> > > > > > > > Why not limit the change to ppc then?
> > > > > > >
> > > > > > > Because the bug is masked by the x86 memory model, but it is still
> > > > > > > there even there conceptually. It is not really true that x86 does
> > > > > > > not need memory barriers, though it doesn't in this case:
> > > > > > >
> > > > > > > http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
> > > > > > >
> > > > > > > Paolo
> > > > > > 
> > > > > > Right.
> > > > > > To summarize, on x86 we probably want wmb and rmb to be compiler
> > > > > > barrier only. Only mb might in theory need to be an mfence.
> > > > > 
> > > > > No, wmb needs to be sfence and rmb needs to be lfence.  GCC does
> > > > > not provide those, so they should become __sync_synchronize() too,
> > > > > or you should use inline assembly.
> > > > > 
> > > > > > But there might be reasons why that is not an issue either
> > > > > > if we look closely enough.
> > > > > 
> > > > > Since the ring buffers are not using locked instructions (no xchg
> > > > > or cmpxchg) the barriers simply must be there, even on x86.  Whether
> > > > > it works in practice is not interesting, only the formal model is
> > > > > interesting.
> > > > > 
> > > > > Paolo
> > > > 
> > > > Well, can you describe an issue in virtio that lfence/sfence help solve
> > > > in terms of a memory model please?
> > > > Pls note that guest uses smp_ variants for barriers.
> > > 
> > > Ok, so, I'm having a bit of trouble with the fact that I'm having to
> > > argue the case that things the protocol requiress to be memory
> > > barriers actually *be* memory barriers on all platforms.
> > > 
> > > I mean argue for a richer set of barriers, with per-arch minimal
> > > implementations instead of the large but portable hammer of
> > > sync_synchronize, if you will.
> > 
> > That's what I'm saying really. On x86 the richer set of barriers
> > need not insert code at all for both wmb and rmb macros. All we
> > might need is an 'optimization barrier'- e.g. linux does
> >  __asm__ __volatile__("": : :"memory")
> > ppc needs something like sync_synchronize there.
> 
> But you're approaching this the wrong way around - correctness should
> come first.  That is, we should first ensure that there is a
> sufficient memory barrier to satisfy the protocol.  Then, *if* there
> is a measurable performance improvement and *if* we can show that a
> weaker barrier is sufficient on a given platform, then we can whittle
> it down to a lighter barrier.

You are only looking at ppc. But on x86 this code ships in
production. So changes should be made in a way to reduce
a potential for regressions, balancing risk versus potential benefit.
I'm trying to point out a way to do this.

> > > But just leaving them out on x86!?
> > > Seriously, wtf?  Do you enjoy having software that works chiefly by
> > > accident?
> > 
> > I'm surprised at the controversy too. People seem to argue that
> > x86 cpu does not reorder stores and that we need an sfence between
> > stores to prevent the guest from seeing them out of order, at
> > the same time.
> 
> I don't know the x86 storage model well enough to definitively say
> that the barrier is not necessary there - nor to say that it is
> necessary.  All I know is that the x86 model is quite strongly
> ordered, and I assume that is why the lack of barrier has not caused
> an observed problem on x86.

Please review Documentation/memory-barriers.txt as one reference.
then look at how SMP barriers are implemented at various systems.
In particular, note how it says 'Mandatory barriers should not be used
to control SMP effects'.

> Again, correctness first.  sync_synchronize should be a sufficient
> barrier for wmb() on all platforms.  If you really don't want it, the
> onus is on you

Just for fun, I did a quick hack replacing all barriers with mb()
in the userspace virtio test. This is on x386.

Before:
address@hidden virtio]$ sudo time ./virtio_test 
spurious wakeus: 0x1da
24.53user 14.63system 0:41.91elapsed 93%CPU (0avgtext+0avgdata
464maxresident)k
0inputs+0outputs (0major+154minor)pagefaults 0swaps

After:
address@hidden virtio]$ sudo time ./virtio_test 
spurious wakeus: 0x218
33.97user 6.22system 0:42.10elapsed 95%CPU (0avgtext+0avgdata
464maxresident)k
0inputs+0outputs (0major+154minor)pagefaults 0swaps

So user time went up significantly, as expected. Surprisingly the kernel
side started working more efficiently - surprising since
kernel was not changed - with net effect close to evening out.

So a risk of performance regressions from unnecessary fencing
seems to be non-zero, assuming that time doesn't lie.
This might be worth investigating, but I'm out of time right now.


> to show that (a) it's safe to do so and
> (b) it's actually worth it.

Worth what? I'm asking you to minimuse disruption to other platforms
while you fix ppc.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]