qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] aio: reg. smp_read_barrier_depends() in aio_bh_poll()


From: Pranith Kumar
Subject: Re: [Qemu-devel] aio: reg. smp_read_barrier_depends() in aio_bh_poll()
Date: Fri, 02 Sep 2016 14:23:12 -0400

Paolo Bonzini writes:

> On 02/09/2016 16:33, Pranith Kumar wrote:
>> 
>> Hi Paolo,
>> 
>> This is in reference to the discussion we had yesterday on IRC. I am trying 
>> to
>> understand the need for smp_read_barrier_depends() and how it prevents the
>> following race condition. I think a regular barrier() should suffice instead
>> of smp_read_barrier_depends(). Consider:
>> 
>>            P0                        P1
>>      ----------------------------------------
>>      bh = ctx->first_bh;        
>>      smp_read_barrier_depends();  // barrier() should be sufficient since bh
>>                                   // is local variable
>>      next = bh->next;
>>                                   lock(bh_lock);
>>                                   new_bh->next = ctx->first_bh;
>>                                   smp_wmb();
>>                                   ctx->first_bh = new_bh;
>>                                   unlock(bh_lock);
>>                                   
>>      if (bh) {
>>         // do something
>>      }
>> 
>> Why do you think smp_read_barrier_depends() is necessary here? If bh was a
>> shared variable I would understand, but here bh is local and a regular
>> barrier() would make sure that we are not optimizing the initial load into 
>> bh.
>
> Honestly, I don't think you understand why memory barriers exist...

Well, you are not entirely wrong :-). I think this is a difficult topic to
excel at. And I am trying to wrap my head around Alpha barrier semantics which
is an entirely different ball game.

> They are used to synchronize writes to shared *data*, not to shared
> variables.

My bad. Shared memory location is what I meant too. I will be more precise
when I write.

>
> It doesn't matter whether bh is a shared variable.  The *data that it
> points to* is shared with other threads.  ctx->first_bh and bh->next are
> both shared by P0 and P1.
>
> P1 must make sure that ctx->first_bh is written after all of its context
> (which in aio_bh_new includes new_bh->next) is ready.  It uses smp_wmb
> for that.  A "release store" for ctx->first_bh would be okay too.  This
> is easy.
>
> P0 must make sure that bh->next is read after ctx->first_bh.  The
> simplest way to ensure this is an "acquire load" for ctx->first_bh and
> bh->next place an smp_rmb where there is currently
> smp_read_barrier_depends().  This is easy too, but a bit overkill
> because bh->next is really ctx->first_bh->next and data dependent reads
> do not need full-blown acquire semantics.
>
> However, you still need to make sure that bh->next is read from
> _exactly_ the ctx->first_bh that was assigned to bh, and not for example
> a value that was changed in the meanwhile by another processor.  Most
> processors promise this (except the Alpha!) but compilers might reload
> values if they think it's useful.  For this reason Linux and QEMU have
> smp_read_barrier_depends(), and for this reason C11/C++11 introduce the
> "consume" memory order.  smp_read_barrier_depends() is the same as C11's
> atomic_thread_fence(MEMORDER_CONSUME).  We didn't make it up.

If I understand you correctly, this is what might happen without the
barrier():

            P0                        P1
      ----------------------------------------
      // bh = ctx->first_bh;  optimized       
      if (ctx->first_bh) {
      // next = ctx->first_bh->next;
                                   lock(bh_lock);
                                   new_bh->next = ctx->first_bh;
                                   smp_wmb();  // this alone is not sufficient
                                               // for Alpha
                                   ctx->first_bh = new_bh;
                                   unlock(bh_lock);
      // bh = next;
      bh = ctx->first_bh->next;

      if (bh) {do something}
      }

Is this what might happen? If so, inserting a barrier() after the first load
into bh will prevent the compiler from optimizing the load into bh since the
compiler cannot optimize away loads and stores past the barrier().

And on Alpha processors barrier() should really be smp_read_barrier_depends()
to prevent this from happening because of it's memory model(issue a barrier
after loading a pointer to shared memory and before dereferencing it).

>
> So instead of smp_read_barrier_depends() you could load ctx->first_bh
> and bh->next with the consume memory order, but you do need _something_.

OK, if the above situation is possible, then I think I understand the need for
this barrier.

-- 
Pranith



reply via email to

[Prev in Thread] Current Thread [Next in Thread]