[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v3 02/13] cputlb: Add new TLB_EXCL flag
From: |
alvise rigo |
Subject: |
Re: [Qemu-devel] [RFC v3 02/13] cputlb: Add new TLB_EXCL flag |
Date: |
Thu, 16 Jul 2015 17:04:36 +0200 |
On Thu, Jul 16, 2015 at 4:32 PM, Alex Bennée <address@hidden> wrote:
>
>
> Alvise Rigo <address@hidden> writes:
>
> > Add a new flag for the TLB entries to force all the accesses made to a
> > page to follow the slow-path.
> >
> > In the case we remove a TLB entry marked as EXCL, we unset the
> > corresponding exclusive bit in the bitmap.
> >
> > Mark the accessed page as dirty to invalidate any pending operation of
> > LL/SC only if a vCPU writes to the protected address.
> >
> > Suggested-by: Jani Kokkonen <address@hidden>
> > Suggested-by: Claudio Fontana <address@hidden>
> > Signed-off-by: Alvise Rigo <address@hidden>
> > ---
> > cputlb.c | 18 ++++-
> > include/exec/cpu-all.h | 2 +
> > include/exec/cpu-defs.h | 4 +
> > softmmu_template.h | 189
> > +++++++++++++++++++++++++++++++-----------------
> > 4 files changed, 144 insertions(+), 69 deletions(-)
> >
> > diff --git a/cputlb.c b/cputlb.c
> > index e5853fd..0aca407 100644
> > --- a/cputlb.c
> > +++ b/cputlb.c
> > @@ -380,6 +380,16 @@ void tlb_set_page_with_attrs(CPUState *cpu,
> > target_ulong vaddr,
> > env->tlb_v_table[mmu_idx][vidx] = *te;
> > env->iotlb_v[mmu_idx][vidx] = env->iotlb[mmu_idx][index];
> >
> > + if (!(te->addr_write & TLB_MMIO) && (te->addr_write & TLB_EXCL)) {
> > + /* We are removing an exclusive entry, if the corresponding
> > exclusive
> > + * bit is set, unset it. */
> > + hwaddr hw_addr = (env->iotlb[mmu_idx][index].addr &
> > TARGET_PAGE_MASK) +
> > + (te->addr_write &
> > TARGET_PAGE_MASK);
> > + if (cpu_physical_memory_excl_is_dirty(hw_addr)) {
> > + cpu_physical_memory_set_excl_dirty(hw_addr);
> > + }
>
> I'm confused. I'm reading that as "if the dirty exclusive bit is set
> then set the dirty exclusive bit", that doesn't seem right. The comment
> seems to imply that should be a: cpu_physical_memory_clear_excl_dirty?
Yes, you are right, I've already fixed this issued in the upcoming v4.
It should be:
if (!cpu_physical_memory_excl_is_dirty(hw_addr)) {
cpu_physical_memory_set_excl_dirty(hw_addr);
}
I will also make the comment more clear.
The rational is to restore the dirty state of a page when a vCPU is
deleting the EXCL TLB entry associated to that page.
This piece of code actually lowers a lot the performance, since it
will then force all the other vCPUs to flush at the next stcond. This
is why I'm testing right now a version of this patch series where each
vCPU has its own EXCL bit: the performance is much better at the cost
of a bigger bitmap.
>
>
> > + }
> > +
> > /* refill the tlb */
> > env->iotlb[mmu_idx][index].addr = iotlb - vaddr;
> > env->iotlb[mmu_idx][index].attrs = attrs;
> > @@ -405,7 +415,13 @@ void tlb_set_page_with_attrs(CPUState *cpu,
> > target_ulong vaddr,
> > + xlat)) {
> > te->addr_write = address | TLB_NOTDIRTY;
> > } else {
> > - te->addr_write = address;
> > + if (!(address & TLB_MMIO) &&
> > + !cpu_physical_memory_excl_is_dirty(section->mr->ram_addr
> > + + xlat)) {
> > + te->addr_write = address | TLB_EXCL;
> > + } else {
> > + te->addr_write = address;
> > + }
> > }
> > } else {
> > te->addr_write = -1;
> > diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> > index ac06c67..632f6ce 100644
> > --- a/include/exec/cpu-all.h
> > +++ b/include/exec/cpu-all.h
> > @@ -311,6 +311,8 @@ extern RAMList ram_list;
> > #define TLB_NOTDIRTY (1 << 4)
> > /* Set if TLB entry is an IO callback. */
> > #define TLB_MMIO (1 << 5)
> > +/* Set if TLB entry refers a page that requires exclusive access. */
> > +#define TLB_EXCL (1 << 6)
>
> I wonder if a compile time assert should be added here to trap the case
> when TARGET_PAGE_MASK starts encroaching on the lower bits? It looks
> like the smallest at the moment gives us 10 bits to play with.
Yes, it absolutely makes sense. I will take this into account for the
next version.
>
> >
> > void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
> > void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
> > diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
> > index d5aecaf..c73a75f 100644
> > --- a/include/exec/cpu-defs.h
> > +++ b/include/exec/cpu-defs.h
> > @@ -165,5 +165,9 @@ typedef struct CPUIOTLBEntry {
> > #define CPU_COMMON \
> > /* soft mmu support */ \
> > CPU_COMMON_TLB \
> > + \
> > + /* Used for atomic instruction translation. */ \
> > + bool ll_sc_context; \
> > + hwaddr excl_protected_hwaddr; \
> >
> > #endif
> > diff --git a/softmmu_template.h b/softmmu_template.h
> > index 18871f5..0edd451 100644
> > --- a/softmmu_template.h
> > +++ b/softmmu_template.h
> > @@ -141,6 +141,23 @@
> > vidx >= 0;
> > \
> > })
> >
> > +#define lookup_cpus_ll_addr(addr)
> > \
> > +({
> > \
> > + CPUState *cpu;
> > \
> > + CPUArchState *acpu;
> > \
> > + bool hit = false;
> > \
> > +
> > \
> > + CPU_FOREACH(cpu) {
> > \
> > + acpu = (CPUArchState *)cpu->env_ptr;
> > \
> > + if (cpu != current_cpu && acpu->excl_protected_hwaddr == addr) {
> > \
> > + hit = true;
> > \
> > + break;
> > \
> > + }
> > \
> > + }
> > \
> > +
> > \
> > + hit;
> > \
> > +})
> > +
>
> Is there a reason to abuse a #define like this instead of having an
> inline and letting the compiler sort it out?
No, I can also move it to cputlb.c if necessary.
>
> > #ifndef SOFTMMU_CODE_ACCESS
> > static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
> > CPUIOTLBEntry *iotlbentry,
> > @@ -414,43 +431,61 @@ void helper_le_st_name(CPUArchState *env,
> > target_ulong addr, DATA_TYPE val,
> > tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> > }
> >
> > - /* Handle an IO access. */
> > + /* Handle an IO access or exclusive access. */
> > if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> > - CPUIOTLBEntry *iotlbentry;
> > - if ((addr & (DATA_SIZE - 1)) != 0) {
> > - goto do_unaligned_access;
> > - }
> > - iotlbentry = &env->iotlb[mmu_idx][index];
> > -
> > - /* ??? Note that the io helpers always read data in the target
> > - byte ordering. We should push the LE/BE request down into io.
> > */
> > - val = TGT_LE(val);
> > - glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> > - return;
> > - }
> > -
> > - /* Handle slow unaligned access (it spans two pages or IO). */
> > - if (DATA_SIZE > 1
> > - && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> > - >= TARGET_PAGE_SIZE)) {
> > - int i;
> > - do_unaligned_access:
> > - if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> > - cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> > - mmu_idx, retaddr);
> > + CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> > + if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> > + /* The slow-path has been forced since we are writing to
> > + * exclusive-protected memory. */
> > + hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> > +
> > + bool set_to_dirty;
> > +
> > + /* Two cases of invalidation: the current vCPU is writing to
> > another
> > + * vCPU's exclusive address or the vCPU that issued the
> > LoadLink is
> > + * writing to it, but not through a StoreCond. */
> > + set_to_dirty = lookup_cpus_ll_addr(hw_addr);
> > + set_to_dirty |= env->ll_sc_context &&
> > + (env->excl_protected_hwaddr == hw_addr);
> > +
> > + if (set_to_dirty) {
> > + cpu_physical_memory_set_excl_dirty(hw_addr);
> > + } /* the vCPU is legitimately writing to the protected address
> > */
> > + } else {
> > + if ((addr & (DATA_SIZE - 1)) != 0) {
> > + goto do_unaligned_access;
> > + }
> > +
> > + /* ??? Note that the io helpers always read data in the target
> > + byte ordering. We should push the LE/BE request down into
> > io. */
> > + val = TGT_LE(val);
> > + glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> > + return;
> > }
> > - /* XXX: not efficient, but simple */
> > - /* Note: relies on the fact that tlb_fill() does not remove the
> > - * previous page from the TLB cache. */
> > - for (i = DATA_SIZE - 1; i >= 0; i--) {
> > - /* Little-endian extract. */
> > - uint8_t val8 = val >> (i * 8);
> > - /* Note the adjustment at the beginning of the function.
> > - Undo that for the recursion. */
> > - glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> > - oi, retaddr + GETPC_ADJ);
> > + } else {
> > + /* Handle slow unaligned access (it spans two pages or IO). */
> > + if (DATA_SIZE > 1
> > + && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> > + >= TARGET_PAGE_SIZE)) {
> > + int i;
> > + do_unaligned_access:
> > + if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> > + cpu_unaligned_access(ENV_GET_CPU(env), addr,
> > MMU_DATA_STORE,
> > + mmu_idx, retaddr);
> > + }
> > + /* XXX: not efficient, but simple */
> > + /* Note: relies on the fact that tlb_fill() does not remove the
> > + * previous page from the TLB cache. */
> > + for (i = DATA_SIZE - 1; i >= 0; i--) {
> > + /* Little-endian extract. */
> > + uint8_t val8 = val >> (i * 8);
> > + /* Note the adjustment at the beginning of the function.
> > + Undo that for the recursion. */
> > + glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> > + oi, retaddr + GETPC_ADJ);
> > + }
> > + return;
> > }
> > - return;
> > }
>
> OK I can just about follow what happened now with the 3 exit points and
> extra goto thrown in but this function is starting to smell. The changes
> seem reasonable but what happens to the next tweak to the function?
I agree with you...It's a bit messy, but it does not touch the likely
path at all.
Thank you,
alvise
>
> >
> > /* Handle aligned access or unaligned access in the same page. */
> > @@ -494,43 +529,61 @@ void helper_be_st_name(CPUArchState *env,
> > target_ulong addr, DATA_TYPE val,
> > tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> > }
> >
> > - /* Handle an IO access. */
> > + /* Handle an IO access or exclusive access. */
> > if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> > - CPUIOTLBEntry *iotlbentry;
> > - if ((addr & (DATA_SIZE - 1)) != 0) {
> > - goto do_unaligned_access;
> > - }
> > - iotlbentry = &env->iotlb[mmu_idx][index];
> > -
> > - /* ??? Note that the io helpers always read data in the target
> > - byte ordering. We should push the LE/BE request down into io.
> > */
> > - val = TGT_BE(val);
> > - glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> > - return;
> > - }
> > -
> > - /* Handle slow unaligned access (it spans two pages or IO). */
> > - if (DATA_SIZE > 1
> > - && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> > - >= TARGET_PAGE_SIZE)) {
> > - int i;
> > - do_unaligned_access:
> > - if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> > - cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
> > - mmu_idx, retaddr);
> > + CPUIOTLBEntry *iotlbentry = &env->iotlb[mmu_idx][index];
> > + if ((tlb_addr & ~TARGET_PAGE_MASK) == TLB_EXCL) {
> > + /* The slow-path has been forced since we are writing to
> > + * exclusive-protected memory. */
> > + hwaddr hw_addr = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> > +
> > + bool set_to_dirty;
> > +
> > + /* Two cases of invalidation: the current vCPU is writing to
> > another
> > + * vCPU's exclusive address or the vCPU that issued the
> > LoadLink is
> > + * writing to it, but not through a StoreCond. */
> > + set_to_dirty = lookup_cpus_ll_addr(hw_addr);
> > + set_to_dirty |= env->ll_sc_context &&
> > + (env->excl_protected_hwaddr == hw_addr);
> > +
> > + if (set_to_dirty) {
> > + cpu_physical_memory_set_excl_dirty(hw_addr);
> > + } /* the vCPU is legitimately writing to the protected address
> > */
> > + } else {
> > + if ((addr & (DATA_SIZE - 1)) != 0) {
> > + goto do_unaligned_access;
> > + }
> > +
> > + /* ??? Note that the io helpers always read data in the target
> > + byte ordering. We should push the LE/BE request down into
> > io. */
> > + val = TGT_BE(val);
> > + glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
> > + return;
> > }
> > - /* XXX: not efficient, but simple */
> > - /* Note: relies on the fact that tlb_fill() does not remove the
> > - * previous page from the TLB cache. */
> > - for (i = DATA_SIZE - 1; i >= 0; i--) {
> > - /* Big-endian extract. */
> > - uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
> > - /* Note the adjustment at the beginning of the function.
> > - Undo that for the recursion. */
> > - glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> > - oi, retaddr + GETPC_ADJ);
> > + } else {
> > + /* Handle slow unaligned access (it spans two pages or IO). */
> > + if (DATA_SIZE > 1
> > + && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> > + >= TARGET_PAGE_SIZE)) {
> > + int i;
> > + do_unaligned_access:
> > + if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
> > + cpu_unaligned_access(ENV_GET_CPU(env), addr,
> > MMU_DATA_STORE,
> > + mmu_idx, retaddr);
> > + }
> > + /* XXX: not efficient, but simple */
> > + /* Note: relies on the fact that tlb_fill() does not remove the
> > + * previous page from the TLB cache. */
> > + for (i = DATA_SIZE - 1; i >= 0; i--) {
> > + /* Big-endian extract. */
> > + uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
> > + /* Note the adjustment at the beginning of the function.
> > + Undo that for the recursion. */
> > + glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> > + oi, retaddr + GETPC_ADJ);
> > + }
> > + return;
> > }
> > - return;
> > }
> >
> > /* Handle aligned access or unaligned access in the same page. */
>
> --
> Alex Bennée
- [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation, Alvise Rigo, 2015/07/10
- [Qemu-devel] [RFC v3 01/13] exec: Add new exclusive bitmap to ram_list, Alvise Rigo, 2015/07/10
- [Qemu-devel] [RFC v3 02/13] cputlb: Add new TLB_EXCL flag, Alvise Rigo, 2015/07/10
- [Qemu-devel] [RFC v3 03/13] softmmu: Add helpers for a new slow-path, Alvise Rigo, 2015/07/10
- [Qemu-devel] [RFC v3 06/13] target-i386: translate: implement qemu_ldlink and qemu_stcond ops, Alvise Rigo, 2015/07/10
- [Qemu-devel] [RFC v3 04/13] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions, Alvise Rigo, 2015/07/10
- [Qemu-devel] [RFC v3 07/13] ram_addr.h: Make exclusive bitmap accessors atomic, Alvise Rigo, 2015/07/10