Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-thre

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-thre

From:	Alex Bennée
Subject:	Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading
Date:	Tue, 14 Jun 2016 14:14:02 +0100
User-agent:	mu4e 0.9.17; emacs 25.0.95.1

alvise rigo <address@hidden> writes:

> 1. LL(x)                   // x requires a flush
> 2. query flush to all the n VCPUs
> 3. exit from the CPU loop and wait until all the flushes are done
> 4. enter the loop to re-execute LL(x). This time no flush is required
>
> Now, points 2. and 3. can be done either with n calls of
> async_safe_run_on_cpu() or with n calls of async_wait_run_on_cpu(). In my
> opinion the former is not really done for this use case since it would call
> n^2 times cpu_exit() and it would not really ensure that the VCPU has
> exited from the guest code to make an iteration of the CPU loop.

async_safe_run_on_cpu maybe should be renamed async_safe_run_on_system()
as all vCPUs are held. You only need one helper to do all the flushing
and bit-setting.

>
> Regards,
> alvise
>
> On Tue, Jun 14, 2016 at 2:00 PM, Alex Bennée <address@hidden> wrote:
>>
>> alvise rigo <address@hidden> writes:
>>
>>> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov <address@hidden> wrote:
>>>> On 26/05/16 19:35, Alvise Rigo wrote:
>>>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>>>>> LoadLink/StoreConditional thread safe.
>>>>>
>>>>> During an LL access, this lock protects the load access itself, the
>>>>> update of the exclusive history and the update of the VCPU's protected
>>>>> range.  In a SC access, the lock protects the store access itself, the
>>>>> possible reset of other VCPUs' protected range and the reset of the
>>>>> exclusive context of calling VCPU.
>>>>>
>>>>> The lock is also taken when a normal store happens to access an
>>>>> exclusive page to reset other VCPUs' protected range in case of
>>>>> collision.
>>>>
>>>> I think the key problem here is that the load in LL helper can race with
>>>> a concurrent regular fast-path store. It's probably easier to annotate
>>>> the source here:
>>>>
>>>>      1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>>>      2                                  TCGMemOpIdx oi, uintptr_t retaddr)
>>>>      3  {
>>>>      4      WORD_TYPE ret;
>>>>      5      int index;
>>>>      6      CPUState *this_cpu = ENV_GET_CPU(env);
>>>>      7      CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>>>      8      hwaddr hw_addr;
>>>>      9      unsigned mmu_idx = get_mmuidx(oi);
>>>>
>>>>     10      index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>>
>>>>     11      tcg_exclusive_lock();
>>>>
>>>>     12      /* Use the proper load helper from cpu_ldst.h */
>>>>     13      ret = helper_ld(env, addr, oi, retaddr);
>>>>
>>>>     14      /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>>>> + xlat)
>>>>     15       * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>>>     16      hw_addr = (env->iotlb[mmu_idx][index].addr &
>>>> TARGET_PAGE_MASK) + addr;
>>>>     17      if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>>>> TLB_MMIO))) {
>>>>     18          /* If all the vCPUs have the EXCL bit set for this page
>>>> there is no need
>>>>     19           * to request any flush. */
>>>>     20          if (cpu_physical_memory_set_excl(hw_addr)) {
>>>>     21              CPUState *cpu;
>>>>
>>>>     22              excl_history_put_addr(hw_addr);
>>>>     23              CPU_FOREACH(cpu) {
>>>>     24                  if (this_cpu != cpu) {
>>>>     25                      tlb_flush_other(this_cpu, cpu, 1);
>>>>     26                  }
>>>>     27              }
>>>>     28          }
>>>>     29          /* For this vCPU, just update the TLB entry, no need to
>>>> flush. */
>>>>     30          env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>>>     31      } else {
>>>>     32          /* Set a pending exclusive access in the MemoryRegion */
>>>>     33          MemoryRegion *mr = iotlb_to_region(this_cpu,
>>>>     34
>>>> env->iotlb[mmu_idx][index].addr,
>>>>     35
>>>> env->iotlb[mmu_idx][index].attrs);
>>>>     36          mr->pending_excl_access = true;
>>>>     37      }
>>>>
>>>>     38      cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>>
>>>>     39      tcg_exclusive_unlock();
>>>>
>>>>     40      /* From now on we are in LL/SC context */
>>>>     41      this_cpu->ll_sc_context = true;
>>>>
>>>>     42      return ret;
>>>>     43  }
>>>>
>>>>
>>>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>>>> store at this address occurs after we finished load at line 13 but
>>>> before TLB is flushed as a result of line 25. If we reorder the load to
>>>> happen after the TLB flush request we still must be sure that the flush
>>>> is complete before we can do the load safely.
>>>
>>> You are right, the risk actually exists. One solution to the problem
>>> could be to ignore the data acquired by the load and redo the LL after
>>> the flushes have been completed (basically the disas_ctx->pc points to
>>> the LL instruction). This time the LL will happen without flush
>>> requests and the access will be actually protected by the lock.
>>
>> So is just punting the work of to an async safe task and restarting the
>> vCPU thread going to be enough?
>>
>>>
>>> Regards,
>>> alvise
>>>
>>>>
>>>>>
>>>>> Moreover, adapt target-arm to also cope with the new multi-threaded
>>>>> execution.
>>>>>
>>>>> Signed-off-by: Alvise Rigo <address@hidden>
>>>>> ---
>>>>>  softmmu_llsc_template.h | 11 +++++++++--
>>>>>  softmmu_template.h      |  6 ++++++
>>>>>  target-arm/op_helper.c  |  6 ++++++
>>>>>  3 files changed, 21 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>>>>> index 2c4a494..d3810c0 100644
>>>>> --- a/softmmu_llsc_template.h
>>>>> +++ b/softmmu_llsc_template.h
>>>>> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, 
>>>>> target_ulong addr,
>>>>>      hwaddr hw_addr;
>>>>>      unsigned mmu_idx = get_mmuidx(oi);
>>>>>
>>>>> +    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>>> +
>>>>> +    tcg_exclusive_lock();
>>>>> +
>>>>>      /* Use the proper load helper from cpu_ldst.h */
>>>>>      ret = helper_ld(env, addr, oi, retaddr);
>>>>>
>>>>> -    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>>> -
>>>>>      /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
>>>>>       * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>>>>      hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + 
>>>>> addr;
>>>>> @@ -95,6 +97,8 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, 
>>>>> target_ulong addr,
>>>>>
>>>>>      cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>>>
>>>>> +    tcg_exclusive_unlock();
>>>>> +
>>>>>      /* From now on we are in LL/SC context */
>>>>>      this_cpu->ll_sc_context = true;
>>>>>
>>>>> @@ -114,6 +118,8 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, 
>>>>> target_ulong addr,
>>>>>           * access as one made by the store conditional wrapper. If the 
>>>>> store
>>>>>           * conditional does not succeed, the value will be set to 0.*/
>>>>>          cpu->excl_succeeded = true;
>>>>> +
>>>>> +        tcg_exclusive_lock();
>>>>>          helper_st(env, addr, val, oi, retaddr);
>>>>>
>>>>>          if (cpu->excl_succeeded) {
>>>>> @@ -123,6 +129,7 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, 
>>>>> target_ulong addr,
>>>>>
>>>>>      /* Unset LL/SC context */
>>>>>      cc->cpu_reset_excl_context(cpu);
>>>>> +    tcg_exclusive_unlock();
>>>>>
>>>>>      return ret;
>>>>>  }
>>>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>>>> index 76fe37e..9363a7b 100644
>>>>> --- a/softmmu_template.h
>>>>> +++ b/softmmu_template.h
>>>>> @@ -537,11 +537,16 @@ static inline void 
>>>>> smmu_helper(do_excl_store)(CPUArchState *env,
>>>>>          }
>>>>>      }
>>>>>
>>>>> +    /* Take the lock in case we are not coming from a SC */
>>>>> +    tcg_exclusive_lock();
>>>>> +
>>>>>      smmu_helper(do_ram_store)(env, little_endian, val, addr, oi,
>>>>>                                get_mmuidx(oi), index, retaddr);
>>>>>
>>>>>      reset_other_cpus_colliding_ll_addr(hw_addr, DATA_SIZE);
>>>>>
>>>>> +    tcg_exclusive_unlock();
>>>>> +
>>>>>      return;
>>>>>  }
>>>>>
>>>>> @@ -572,6 +577,7 @@ void helper_le_st_name(CPUArchState *env, 
>>>>> target_ulong addr, DATA_TYPE val,
>>>>>      /* Handle an IO access or exclusive access.  */
>>>>>      if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>>>>          if (tlb_addr & TLB_EXCL) {
>>>>> +
>>>>>              smmu_helper(do_excl_store)(env, true, val, addr, oi, index,
>>>>>                                         retaddr);
>>>>>              return;
>>>>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
>>>>> index e22afc5..19ea52d 100644
>>>>> --- a/target-arm/op_helper.c
>>>>> +++ b/target-arm/op_helper.c
>>>>> @@ -35,7 +35,9 @@ static void raise_exception(CPUARMState *env, uint32_t 
>>>>> excp,
>>>>>      cs->exception_index = excp;
>>>>>      env->exception.syndrome = syndrome;
>>>>>      env->exception.target_el = target_el;
>>>>> +    tcg_exclusive_lock();
>>>>>      cc->cpu_reset_excl_context(cs);
>>>>> +    tcg_exclusive_unlock();
>>>>>      cpu_loop_exit(cs);
>>>>>  }
>>>>>
>>>>> @@ -58,7 +60,9 @@ void HELPER(atomic_clear)(CPUARMState *env)
>>>>>      CPUState *cs = ENV_GET_CPU(env);
>>>>>      CPUClass *cc = CPU_GET_CLASS(cs);
>>>>>
>>>>> +    tcg_exclusive_lock();
>>>>>      cc->cpu_reset_excl_context(cs);
>>>>> +    tcg_exclusive_unlock();
>>>>>  }
>>>>>
>>>>>  uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
>>>>> @@ -874,7 +878,9 @@ void HELPER(exception_return)(CPUARMState *env)
>>>>>
>>>>>      aarch64_save_sp(env, cur_el);
>>>>>
>>>>> +    tcg_exclusive_lock();
>>>>>      cc->cpu_reset_excl_context(cs);
>>>>> +    tcg_exclusive_unlock();
>>>>>
>>>>>      /* We must squash the PSTATE.SS bit to zero unless both of the
>>>>>       * following hold:
>>>>
>>
>>
>> --
>> Alex Bennée


--
Alex Bennée

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Sergey Fedorov, 2016/06/10
- Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, alvise rigo, 2016/06/10
  - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Sergey Fedorov, 2016/06/10
    - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, alvise rigo, 2016/06/10
  - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Alex Bennée, 2016/06/14
    - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, alvise rigo, 2016/06/14
    - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Alex Bennée <=
- Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Alex Bennée, 2016/06/10
  - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Sergey Fedorov, 2016/06/11
  - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Alex Bennée, 2016/06/14
    - Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading, Sergey Fedorov, 2016/06/14

Prev by Date: Re: [Qemu-devel] [RESEND PATCH 00/12] some ARM platform QOM'ify work (resend)
Next by Date: Re: [Qemu-devel] [PATCH] linux-user: add missing return in netlink switch statement
Previous by thread: Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading
Next by thread: Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading
Index(es):
- Date
- Thread