qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Huge TLB performance improvement


From: Daniel Jacobowitz
Subject: Re: [Qemu-devel] [PATCH] Huge TLB performance improvement
Date: Sun, 5 Nov 2006 10:38:20 -0500
User-agent: Mutt/1.5.13 (2006-08-11)

On Mon, Mar 06, 2006 at 02:59:29PM +0000, Thiemo Seufer wrote:
> Hello All,
> 
> this patch vastly improves TLB performance on MIPS, and probably also
> on other architectures. I measured a Linux boot-shutdown cycle,
> including userland init.

Quoting the whole message since this is from March...

I don't remember seeing any followup discussion of this patch, but I
may have missed it.  Thiemo's definitely right about "vastly".  Is this
patch appropriate, or would anyone care to suggest a more
sophisticated data structure to avoid the full cache invalidate?

> 
> With minimal jump cache invalidation:
> 
> real    11m43.429s
> user    9m51.975s
> sys     0m1.375s
> 
>  64.19   1476.81  1476.81 20551904     0.00     0.00  tlb_flush_page
>   6.72   1631.36   154.55   184346     0.00     0.00  cpu_mips_exec
>   4.35   1731.46   100.10  3550500     0.00     0.00  dyngen_code
>   3.66   1815.77    84.31 90897893     0.00     0.00  decode_opc
>   2.89   1882.21    66.44 11170487     0.00     0.00  
> gen_intermediate_code_internal
>   1.72   1921.80    39.59 29919267     0.00     0.00  map_address
>   1.52   1956.66    34.86  7619987     0.00     0.00  tb_find_pc
>   0.96   1978.85    22.19 26361969     0.00     0.00  tlb_set_page_exec
>   0.96   2000.84    21.99                             __ldl_mmu
>   0.90   2021.59    20.75 27279747     0.00     0.00  gen_arith_imm
> 
> 
> With global jump cache kill:
> 
> real    6m19.811s
> user    4m23.650s
> sys     0m0.617s
> 
>  21.67    188.78   188.78   146571     0.00     0.00  cpu_mips_exec
>  11.37    287.88    99.10  3393051     0.00     0.00  dyngen_code
>   9.59    371.45    83.57 89839869     0.00     0.00  decode_opc
>   7.68    438.33    66.88 10989930     0.00     0.00  
> gen_intermediate_code_internal
>   4.24    475.26    36.93 30124659     0.00     0.00  map_address
>   3.80    508.33    33.07  7596879     0.00     0.00  tb_find_pc
>   2.74    532.22    23.89 27781692     0.00     0.00  tlb_set_page_exec
>   2.62    555.02    22.80 39891573     0.00     0.00  
> cpu_mips_handle_mmu_fault
>   2.55    577.25    22.23                             __ldl_mmu
>   2.30    597.26    20.01 26968709     0.00     0.00  gen_arith_imm
> 
> 
> Thiemo
> 
> 
> Index: qemu-work/exec.c
> ===================================================================
> --- qemu-work.orig/exec.c     2006-03-06 01:30:09.000000000 +0000
> +++ qemu-work/exec.c  2006-03-06 01:30:28.000000000 +0000
> @@ -1247,7 +1247,6 @@
>  void tlb_flush_page(CPUState *env, target_ulong addr)
>  {
>      int i;
> -    TranslationBlock *tb;
>  
>  #if defined(DEBUG_TLB)
>      printf("tlb_flush_page: " TARGET_FMT_lx "\n", addr);
> @@ -1261,14 +1260,10 @@
>      tlb_flush_entry(&env->tlb_table[0][i], addr);
>      tlb_flush_entry(&env->tlb_table[1][i], addr);
>  
> -    for(i = 0; i < TB_JMP_CACHE_SIZE; i++) {
> -        tb = env->tb_jmp_cache[i];
> -        if (tb && 
> -            ((tb->pc & TARGET_PAGE_MASK) == addr ||
> -             ((tb->pc + tb->size - 1) & TARGET_PAGE_MASK) == addr)) {
> -            env->tb_jmp_cache[i] = NULL;
> -        }
> -    }
> +    /* We throw away the jump cache altogether. This is cheaper than
> +       trying to be smart by invalidating only the entries in the
> +       affected address range. */
> +    memset (env->tb_jmp_cache, 0, TB_JMP_CACHE_SIZE * sizeof (void *));
>  
>  #if !defined(CONFIG_SOFTMMU)
>      if (addr < MMAP_AREA_END)
> 
> 
> _______________________________________________
> Qemu-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
> 

-- 
Daniel Jacobowitz
CodeSourcery




reply via email to

[Prev in Thread] Current Thread [Next in Thread]