[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-thre
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading |
Date: |
Tue, 14 Jun 2016 14:14:02 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.0.95.1 |
alvise rigo <address@hidden> writes:
> 1. LL(x) // x requires a flush
> 2. query flush to all the n VCPUs
> 3. exit from the CPU loop and wait until all the flushes are done
> 4. enter the loop to re-execute LL(x). This time no flush is required
>
> Now, points 2. and 3. can be done either with n calls of
> async_safe_run_on_cpu() or with n calls of async_wait_run_on_cpu(). In my
> opinion the former is not really done for this use case since it would call
> n^2 times cpu_exit() and it would not really ensure that the VCPU has
> exited from the guest code to make an iteration of the CPU loop.
async_safe_run_on_cpu maybe should be renamed async_safe_run_on_system()
as all vCPUs are held. You only need one helper to do all the flushing
and bit-setting.
>
> Regards,
> alvise
>
> On Tue, Jun 14, 2016 at 2:00 PM, Alex Bennée <address@hidden> wrote:
>>
>> alvise rigo <address@hidden> writes:
>>
>>> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov <address@hidden> wrote:
>>>> On 26/05/16 19:35, Alvise Rigo wrote:
>>>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>>>>> LoadLink/StoreConditional thread safe.
>>>>>
>>>>> During an LL access, this lock protects the load access itself, the
>>>>> update of the exclusive history and the update of the VCPU's protected
>>>>> range. In a SC access, the lock protects the store access itself, the
>>>>> possible reset of other VCPUs' protected range and the reset of the
>>>>> exclusive context of calling VCPU.
>>>>>
>>>>> The lock is also taken when a normal store happens to access an
>>>>> exclusive page to reset other VCPUs' protected range in case of
>>>>> collision.
>>>>
>>>> I think the key problem here is that the load in LL helper can race with
>>>> a concurrent regular fast-path store. It's probably easier to annotate
>>>> the source here:
>>>>
>>>> 1 WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>>> 2 TCGMemOpIdx oi, uintptr_t retaddr)
>>>> 3 {
>>>> 4 WORD_TYPE ret;
>>>> 5 int index;
>>>> 6 CPUState *this_cpu = ENV_GET_CPU(env);
>>>> 7 CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>>> 8 hwaddr hw_addr;
>>>> 9 unsigned mmu_idx = get_mmuidx(oi);
>>>>
>>>> 10 index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>>
>>>> 11 tcg_exclusive_lock();
>>>>
>>>> 12 /* Use the proper load helper from cpu_ldst.h */
>>>> 13 ret = helper_ld(env, addr, oi, retaddr);
>>>>
>>>> 14 /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>>>> + xlat)
>>>> 15 * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>>> 16 hw_addr = (env->iotlb[mmu_idx][index].addr &
>>>> TARGET_PAGE_MASK) + addr;
>>>> 17 if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>>>> TLB_MMIO))) {
>>>> 18 /* If all the vCPUs have the EXCL bit set for this page
>>>> there is no need
>>>> 19 * to request any flush. */
>>>> 20 if (cpu_physical_memory_set_excl(hw_addr)) {
>>>> 21 CPUState *cpu;
>>>>
>>>> 22 excl_history_put_addr(hw_addr);
>>>> 23 CPU_FOREACH(cpu) {
>>>> 24 if (this_cpu != cpu) {
>>>> 25 tlb_flush_other(this_cpu, cpu, 1);
>>>> 26 }
>>>> 27 }
>>>> 28 }
>>>> 29 /* For this vCPU, just update the TLB entry, no need to
>>>> flush. */
>>>> 30 env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>>> 31 } else {
>>>> 32 /* Set a pending exclusive access in the MemoryRegion */
>>>> 33 MemoryRegion *mr = iotlb_to_region(this_cpu,
>>>> 34
>>>> env->iotlb[mmu_idx][index].addr,
>>>> 35
>>>> env->iotlb[mmu_idx][index].attrs);
>>>> 36 mr->pending_excl_access = true;
>>>> 37 }
>>>>
>>>> 38 cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>>
>>>> 39 tcg_exclusive_unlock();
>>>>
>>>> 40 /* From now on we are in LL/SC context */
>>>> 41 this_cpu->ll_sc_context = true;
>>>>
>>>> 42 return ret;
>>>> 43 }
>>>>
>>>>
>>>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>>>> store at this address occurs after we finished load at line 13 but
>>>> before TLB is flushed as a result of line 25. If we reorder the load to
>>>> happen after the TLB flush request we still must be sure that the flush
>>>> is complete before we can do the load safely.
>>>
>>> You are right, the risk actually exists. One solution to the problem
>>> could be to ignore the data acquired by the load and redo the LL after
>>> the flushes have been completed (basically the disas_ctx->pc points to
>>> the LL instruction). This time the LL will happen without flush
>>> requests and the access will be actually protected by the lock.
>>
>> So is just punting the work of to an async safe task and restarting the
>> vCPU thread going to be enough?
>>
>>>
>>> Regards,
>>> alvise
>>>
>>>>
>>>>>
>>>>> Moreover, adapt target-arm to also cope with the new multi-threaded
>>>>> execution.
>>>>>
>>>>> Signed-off-by: Alvise Rigo <address@hidden>
>>>>> ---
>>>>> softmmu_llsc_template.h | 11 +++++++++--
>>>>> softmmu_template.h | 6 ++++++
>>>>> target-arm/op_helper.c | 6 ++++++
>>>>> 3 files changed, 21 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>>>>> index 2c4a494..d3810c0 100644
>>>>> --- a/softmmu_llsc_template.h
>>>>> +++ b/softmmu_llsc_template.h
>>>>> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env,
>>>>> target_ulong addr,
>>>>> hwaddr hw_addr;
>>>>> unsigned mmu_idx = get_mmuidx(oi);
>>>>>
>>>>> + index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>>> +
>>>>> + tcg_exclusive_lock();
>>>>> +
>>>>> /* Use the proper load helper from cpu_ldst.h */
>>>>> ret = helper_ld(env, addr, oi, retaddr);
>>>>>
>>>>> - index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>>> -
>>>>> /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
>>>>> * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>>>> hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) +
>>>>> addr;
>>>>> @@ -95,6 +97,8 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env,
>>>>> target_ulong addr,
>>>>>
>>>>> cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>>>
>>>>> + tcg_exclusive_unlock();
>>>>> +
>>>>> /* From now on we are in LL/SC context */
>>>>> this_cpu->ll_sc_context = true;
>>>>>
>>>>> @@ -114,6 +118,8 @@ WORD_TYPE helper_stcond_name(CPUArchState *env,
>>>>> target_ulong addr,
>>>>> * access as one made by the store conditional wrapper. If the
>>>>> store
>>>>> * conditional does not succeed, the value will be set to 0.*/
>>>>> cpu->excl_succeeded = true;
>>>>> +
>>>>> + tcg_exclusive_lock();
>>>>> helper_st(env, addr, val, oi, retaddr);
>>>>>
>>>>> if (cpu->excl_succeeded) {
>>>>> @@ -123,6 +129,7 @@ WORD_TYPE helper_stcond_name(CPUArchState *env,
>>>>> target_ulong addr,
>>>>>
>>>>> /* Unset LL/SC context */
>>>>> cc->cpu_reset_excl_context(cpu);
>>>>> + tcg_exclusive_unlock();
>>>>>
>>>>> return ret;
>>>>> }
>>>>> diff --git a/softmmu_template.h b/softmmu_template.h
>>>>> index 76fe37e..9363a7b 100644
>>>>> --- a/softmmu_template.h
>>>>> +++ b/softmmu_template.h
>>>>> @@ -537,11 +537,16 @@ static inline void
>>>>> smmu_helper(do_excl_store)(CPUArchState *env,
>>>>> }
>>>>> }
>>>>>
>>>>> + /* Take the lock in case we are not coming from a SC */
>>>>> + tcg_exclusive_lock();
>>>>> +
>>>>> smmu_helper(do_ram_store)(env, little_endian, val, addr, oi,
>>>>> get_mmuidx(oi), index, retaddr);
>>>>>
>>>>> reset_other_cpus_colliding_ll_addr(hw_addr, DATA_SIZE);
>>>>>
>>>>> + tcg_exclusive_unlock();
>>>>> +
>>>>> return;
>>>>> }
>>>>>
>>>>> @@ -572,6 +577,7 @@ void helper_le_st_name(CPUArchState *env,
>>>>> target_ulong addr, DATA_TYPE val,
>>>>> /* Handle an IO access or exclusive access. */
>>>>> if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
>>>>> if (tlb_addr & TLB_EXCL) {
>>>>> +
>>>>> smmu_helper(do_excl_store)(env, true, val, addr, oi, index,
>>>>> retaddr);
>>>>> return;
>>>>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
>>>>> index e22afc5..19ea52d 100644
>>>>> --- a/target-arm/op_helper.c
>>>>> +++ b/target-arm/op_helper.c
>>>>> @@ -35,7 +35,9 @@ static void raise_exception(CPUARMState *env, uint32_t
>>>>> excp,
>>>>> cs->exception_index = excp;
>>>>> env->exception.syndrome = syndrome;
>>>>> env->exception.target_el = target_el;
>>>>> + tcg_exclusive_lock();
>>>>> cc->cpu_reset_excl_context(cs);
>>>>> + tcg_exclusive_unlock();
>>>>> cpu_loop_exit(cs);
>>>>> }
>>>>>
>>>>> @@ -58,7 +60,9 @@ void HELPER(atomic_clear)(CPUARMState *env)
>>>>> CPUState *cs = ENV_GET_CPU(env);
>>>>> CPUClass *cc = CPU_GET_CLASS(cs);
>>>>>
>>>>> + tcg_exclusive_lock();
>>>>> cc->cpu_reset_excl_context(cs);
>>>>> + tcg_exclusive_unlock();
>>>>> }
>>>>>
>>>>> uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
>>>>> @@ -874,7 +878,9 @@ void HELPER(exception_return)(CPUARMState *env)
>>>>>
>>>>> aarch64_save_sp(env, cur_el);
>>>>>
>>>>> + tcg_exclusive_lock();
>>>>> cc->cpu_reset_excl_context(cs);
>>>>> + tcg_exclusive_unlock();
>>>>>
>>>>> /* We must squash the PSTATE.SS bit to zero unless both of the
>>>>> * following hold:
>>>>
>>
>>
>> --
>> Alex Bennée
--
Alex Bennée