[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-thre
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading |
Date: |
Tue, 14 Jun 2016 09:37:47 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.0.95.1 |
Alex Bennée <address@hidden> writes:
> Sergey Fedorov <address@hidden> writes:
>
>> On 26/05/16 19:35, Alvise Rigo wrote:
>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>>> LoadLink/StoreConditional thread safe.
>>>
>>> During an LL access, this lock protects the load access itself, the
>>> update of the exclusive history and the update of the VCPU's protected
>>> range. In a SC access, the lock protects the store access itself, the
>>> possible reset of other VCPUs' protected range and the reset of the
>>> exclusive context of calling VCPU.
>>>
>>> The lock is also taken when a normal store happens to access an
>>> exclusive page to reset other VCPUs' protected range in case of
>>> collision.
>>
>> I think the key problem here is that the load in LL helper can race with
>> a concurrent regular fast-path store. It's probably easier to annotate
>> the source here:
>>
>> 1 WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>> 2 TCGMemOpIdx oi, uintptr_t retaddr)
>> 3 {
>> 4 WORD_TYPE ret;
>> 5 int index;
>> 6 CPUState *this_cpu = ENV_GET_CPU(env);
>> 7 CPUClass *cc = CPU_GET_CLASS(this_cpu);
>> 8 hwaddr hw_addr;
>> 9 unsigned mmu_idx = get_mmuidx(oi);
>>
>> 10 index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>
>> 11 tcg_exclusive_lock();
>>
>> 12 /* Use the proper load helper from cpu_ldst.h */
>> 13 ret = helper_ld(env, addr, oi, retaddr);
>>
>> 14 /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>> + xlat)
>> 15 * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>> 16 hw_addr = (env->iotlb[mmu_idx][index].addr &
>> TARGET_PAGE_MASK) + addr;
>> 17 if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>> TLB_MMIO))) {
>> 18 /* If all the vCPUs have the EXCL bit set for this page
>> there is no need
>> 19 * to request any flush. */
>> 20 if (cpu_physical_memory_set_excl(hw_addr)) {
>> 21 CPUState *cpu;
>>
>> 22 excl_history_put_addr(hw_addr);
>> 23 CPU_FOREACH(cpu) {
>> 24 if (this_cpu != cpu) {
>> 25 tlb_flush_other(this_cpu, cpu, 1);
>> 26 }
>> 27 }
>> 28 }
>> 29 /* For this vCPU, just update the TLB entry, no need to
>> flush. */
>> 30 env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>> 31 } else {
>> 32 /* Set a pending exclusive access in the MemoryRegion */
>> 33 MemoryRegion *mr = iotlb_to_region(this_cpu,
>> 34
>> env->iotlb[mmu_idx][index].addr,
>> 35
>> env->iotlb[mmu_idx][index].attrs);
>> 36 mr->pending_excl_access = true;
>> 37 }
>>
>> 38 cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>
>> 39 tcg_exclusive_unlock();
>>
>> 40 /* From now on we are in LL/SC context */
>> 41 this_cpu->ll_sc_context = true;
>>
>> 42 return ret;
>> 43 }
>>
>>
>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>> store at this address occurs after we finished load at line 13 but
>> before TLB is flushed as a result of line 25. If we reorder the load to
>> happen after the TLB flush request we still must be sure that the flush
>> is complete before we can do the load safely.
>
> I think this can be fixed using async_safe_run_on_cpu and tweaking the
> ldlink helper.
>
> * Change the helper_ldlink call
> - pass it offset-of(cpu->reg[n]) so it can store result of load
> - maybe pass it next-pc (unless there is some other way to know)
>
> vCPU runs until the ldlink instruction occurs and jumps to the helper
>
> * Once in the helper_ldlink
> - queue up an async helper function with info of offset
> - cpu_loop_exit_restore(with next PC)
>
> vCPU the issued the ldlink exits immediately, waits until all vCPUs are
> out of generated code.
>
> * Once in helper_ldlink async helper
> - Everything at this point is quiescent, no vCPU activity
> - Flush all TLBs/set flags
> - Do the load from memory, store directly into cpu->reg[n]
>
> The key thing is once we are committed to load in the async helper
> nothing else can get in the way. Any stores before we are in the helper
> happen as normal, once we exit the async helper all potential
> conflicting stores will slow path.
>
> There is a little messing about in knowing the next PC which is simple
> in the ARM case but gets a bit more complicated for architectures that
> have deferred jump slots. I haven't looked into this nit yet.
Thinking on it further the messing about with offset and PCs could be
solved just by having a simple flag:
- First run, no flag set, queue safe work, restart loop at PC
- Second run, flag set, do load, clear flag
As long as the flag is per-vCPU I think its safe from races.
>
>>
>>>
>>> Moreover, adapt target-arm to also cope with the new multi-threaded
>>> execution.
>>>
>>> Signed-off-by: Alvise Rigo <address@hidden>
>>> ---
>>> softmmu_llsc_template.h | 11 +++++++++--
>>> softmmu_template.h | 6 ++++++
>>> target-arm/op_helper.c | 6 ++++++
>>> 3 files changed, 21 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>>> index 2c4a494..d3810c0 100644
>>> --- a/softmmu_llsc_template.h
>>> +++ b/softmmu_llsc_template.h
>>> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env,
>>> target_ulong addr,
>>> hwaddr hw_addr;
>>> unsigned mmu_idx = get_mmuidx(oi);
>>>
>>> + index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>> +
>>> + tcg_exclusive_lock();
>>> +
>>> /* Use the proper load helper from cpu_ldst.h */
>>> ret = helper_ld(env, addr, oi, retaddr);
>>>
>>> - index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>> -
>>> /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
>>> * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>> hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
>>> @@ -95,6 +97,8 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env,
>>> target_ulong addr,
>>>
>>> cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>
>>> + tcg_exclusive_unlock();
>>> +
>>> /* From now on we are in LL/SC context */
>>> this_cpu->ll_sc_context = true;
>>>
>>> @@ -114,6 +118,8 @@ WORD_TYPE helper_stcond_name(CPUArchState *env,
>>> target_ulong addr,
>>> * access as one made by the store conditional wrapper. If the
>>> store
>>> * conditional does not succeed, the value will be set to 0.*/
>>> cpu->excl_succeeded = true;
>>> +
>>> + tcg_exclusive_lock();
>>> helper_st(env, addr, val, oi, retaddr);
>>>
>>> if (cpu->excl_succeeded) {
>>> @@ -123,6 +129,7 @@ WORD_TYPE helper_stcond_name(CPUArchState *env,
>>> target_ulong addr,
>>>
>>> /* Unset LL/SC context */
>>> cc->cpu_reset_excl_context(cpu);
>>> + tcg_exclusive_unlock();
>>>
>>> return ret;
>>> }
>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>> index 76fe37e..9363a7b 100644
>>> --- a/softmmu_template.h
>>> +++ b/softmmu_template.h
>>> @@ -537,11 +537,16 @@ static inline void
>>> smmu_helper(do_excl_store)(CPUArchState *env,
>>> }
>>> }
>>>
>>> + /* Take the lock in case we are not coming from a SC */
>>> + tcg_exclusive_lock();
>>> +
>>> smmu_helper(do_ram_store)(env, little_endian, val, addr, oi,
>>> get_mmuidx(oi), index, retaddr);
>>>
>>> reset_other_cpus_colliding_ll_addr(hw_addr, DATA_SIZE);
>>>
>>> + tcg_exclusive_unlock();
>>> +
>>> return;
>>> }
>>>
>>> @@ -572,6 +577,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong
>>> addr, DATA_TYPE val,
>>> /* Handle an IO access or exclusive access. */
>>> if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>> if (tlb_addr & TLB_EXCL) {
>>> +
>>> smmu_helper(do_excl_store)(env, true, val, addr, oi, index,
>>> retaddr);
>>> return;
>>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
>>> index e22afc5..19ea52d 100644
>>> --- a/target-arm/op_helper.c
>>> +++ b/target-arm/op_helper.c
>>> @@ -35,7 +35,9 @@ static void raise_exception(CPUARMState *env, uint32_t
>>> excp,
>>> cs->exception_index = excp;
>>> env->exception.syndrome = syndrome;
>>> env->exception.target_el = target_el;
>>> + tcg_exclusive_lock();
>>> cc->cpu_reset_excl_context(cs);
>>> + tcg_exclusive_unlock();
>>> cpu_loop_exit(cs);
>>> }
>>>
>>> @@ -58,7 +60,9 @@ void HELPER(atomic_clear)(CPUARMState *env)
>>> CPUState *cs = ENV_GET_CPU(env);
>>> CPUClass *cc = CPU_GET_CLASS(cs);
>>>
>>> + tcg_exclusive_lock();
>>> cc->cpu_reset_excl_context(cs);
>>> + tcg_exclusive_unlock();
>>> }
>>>
>>> uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
>>> @@ -874,7 +878,9 @@ void HELPER(exception_return)(CPUARMState *env)
>>>
>>> aarch64_save_sp(env, cur_el);
>>>
>>> + tcg_exclusive_lock();
>>> cc->cpu_reset_excl_context(cs);
>>> + tcg_exclusive_unlock();
>>>
>>> /* We must squash the PSTATE.SS bit to zero unless both of the
>>> * following hold:
--
Alex Bennée