[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash functio
From: |
Richard Henderson |
Subject: |
[Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode |
Date: |
Thu, 27 Apr 2017 14:00:00 +0200 |
From: "Emilio G. Cota" <address@hidden>
Optimizations to cross-page chaining and indirect branches make
performance more sensitive to the hit rate of tb_jmp_cache.
The constraint of reserving some bits for the page number
lowers the achievable quality of the hashing function.
However, user-mode does not have this requirement. Thus,
with this change we use for user-mode a hashing function that
is both faster and of better quality than the previous one.
Measurements:
Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
- SPECint06 (test set), x86_64-linux-user. Host:
Intel i7-6700K @ 4.00GHz
2.2x
+-+--------------------------------------------------------------------------------------------------------------+-+
|
|
| jr
|
2x +jr+multhash
+....................................................+++++...................................+-+
| jr+hash
|$$$ |
|
|$+$ |
|
### $ |
1.8x
+-+......................................................................#|#.$...................................+-+
|
++#+# $ |
|
|# # $ |
1.6x
+-+....................................................................***.#.$....................++$$$..........+-+
| $$$
*+* # $ |$+$ |
| ++$$$ ### $ *
* # $ +++|$ $ |
| ++###+$ # # $ *
* # $ ### ****## $ |
1.4x
+-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+
| *+* # $ * * # $ *
* # $ # # $ * *+# $ |
| * * # $ +++++ * * # $ *
* # $ *** # $ * * # $ ###$$ |
1.2x
+-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+
| * * # $ *+* # $ * * # $ +++ *
* # $ ++###$$ * * # $ * * # $ * * # $ |
| ***##$$ * * # $ * * # $ * * # $ ***##$$ ++### *
* # $ *** #+$ * * # $ * * # $ * * # $ |
| *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+# *
* # $ * * # $ * * # $ * * # $ * * # $ |
1x
+-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
| * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ *
* # $ * * # $ * * # $ * * # $ * * # $ |
| * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ *
* # $ * * # $ * * # $ * * # $ * * # $ |
0.8x
+-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf
omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/4UXTrEc
Here I also tried the hash function suggested by Paolo ("multhash"):
return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);
As you can see it is just as good as the other new function ("hash"),
which is what I ended up going with.
- SPECint06 (train set), x86_64-linux-user. Host:
Intel i7-6700K @ 4.00GHz
2.6x
+-+--------------------------------------------------------------------------------------------------------------+-+
|
|
| jr
### |
2.4x
+jr+hash...........................................................................................#.#...........+-+
|
# # |
|
# # |
2.2x
+-+................................................................................................#.#...........+-+
|
# # |
|
# # |
2x
+-+................................................................................................#.#...........+-+
|
**** # |
|
* * # |
1.8x
+-+.............................................................................................*..*.#...........+-+
|
+++ * * # |
|
#### #### * * # |
1.6x
+-+......................................####.............................#..#.****..#..........*..*.#...........+-+
| +++ #++#
**** # * * # #### * * # |
| ### # # *
* # * * # # # * * # |
1.4x
+-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
| *++* # * * # *
* # * * # *** # * * # #### |
| * * # #### * * # *
* # * * # * * # * * # **** # |
1.2x
+-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
| ****### * * # * * # * * # *
* # * * # * * # * * # * * # |
| * * # ***### * * # * * # * * # ****## *
* # * * # * * # * * # * * # |
1x
+-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf
omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/ArCbHqo
- NBench, x86_64-linux-user. Host: Intel
i7-6700K @ 4.00GHz
1.12x
+-+-------------------------------------------------------------------------------------------------------------+-+
|
|
| jr +++
|
1.1x
+jr+hash...........................................................####.........................................+-+
| +++#| #
|
| | #++#
|
1.08x
+-+................................+++................+++.+++..*****..#.........................................+-+
| | +++ | | * | * #
|
| | | | | *+++* #
|
1.06x
+-+................................****###.............|...|...*...*..#.........................+++.............+-+
| *| * |# ****### * * #
| |
| *| *++# *| * |# * * #
#### |
1.04x
+-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+
| * * # *++*++# * * #
+++#++# |
| * * # * * # * * #
| # # +++#### |
1.02x
+-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+
| +++ * * # +++ | * * # * * #
+++ *| * # *+++* # |
| +++ | +++ +++ ++++++ * * # *****### * * # * * #
| +++ ++++++ *++* # * * # |
1x
+-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+
| *****| # *++* |# *****| # * * # * *++# * * # * * #
**** |# * * # * * # * * # |
| * | *| # * *++# * | *++# * * # * * # * * # * * #
*| *++# * * # * * # * * # |
0.98x
+-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+
| *+++* # * * # * * # * * # * * # * * # * * #
* * # * * # * * # * * # |
| * * # * * # * * # * * # * * # * * # * * #
* * # * * # * * # * * # |
0.96x
+-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL
NNUMERIC SOSTRING SORT hmean
png: http://imgur.com/ZXFX0hJ
- NBench, arm-linux-user. Host: Intel
i7-4790K @ 4.00GHz
1.3x
+-+-------------------------------------------------------------------------------------------------------------+-+
| ####
|
| jr # #
+++ |
1.25x
+jr+hash.....................#..#...........................................####................................+-+
| # #
# # |
| # #
# # |
1.2x
+-+..........................#..#...........................................#..#................................+-+
| # #
# # |
| # #
# # |
1.15x
+-+..........................#..#...........................................#..#................................+-+
| # # ####
# # |
| # # # #
# # |
1.1x
+-+..........................#..#..................................#..#.....#..#................................+-+
| # # # #
# # +++ |
| # # #### # #
# # #### |
1.05x
+-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+
| # # # # # # # #
# # +++ # # |
| +++ ***** # #### ***** # # # +++# #
**** # ****### # # |
1x
+-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+
| * * # * * | * * # * * # * * # **** # * * #
* * # * *### * *++# * * # |
| * * # * *### * * # * * # * * # * * # * * #
* * # * * # * * # * * # |
0.95x
+-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+
| * * # * * |# * * # * * # * * # * * # * * #
* * # * * # * * # * * # |
| * * # * * |# * * # * * # * * # * * # * * #
* * # * * # * * # * * # |
0.9x
+-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL
NNUMERIC SOSTRING SORT hmean
png: http://imgur.com/FfD27ey
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
---
include/exec/tb-hash.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 2c27490..b1fe2d0 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -22,6 +22,8 @@
#include "exec/tb-hash-xx.h"
+#ifdef CONFIG_SOFTMMU
+
/* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
addresses on the same page. The top bits are the same. This allows
TLB invalidation to quickly clear a subset of the hash table. */
@@ -45,6 +47,16 @@ static inline unsigned int
tb_jmp_cache_hash_func(target_ulong pc)
| (tmp & TB_JMP_ADDR_MASK));
}
+#else
+
+/* In user-mode we can get better hashing because we do not have a TLB */
+static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
+{
+ return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
+}
+
+#endif /* CONFIG_SOFTMMU */
+
static inline
uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
{
--
2.9.3
- [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu, (continued)
- [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 07/19] tcg: export tcg_gen_lookup_and_goto_ptr, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 16/19] tcg/ppc: Implement goto_ptr, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 15/19] tcg/i386: implement goto_ptr, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 18/19] tcg/sparc: Implement goto_ptr, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode,
Richard Henderson <=
- [Qemu-devel] [PATCH v5 19/19] tcg/s390: Implement goto_ptr, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: Implement goto_ptr, Richard Henderson, 2017/04/27
- [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr, Richard Henderson, 2017/04/27
- Re: [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations, no-reply, 2017/04/27
- [Qemu-devel] [PATCH v5+] TCG cross-tb optimizations, Emilio G. Cota, 2017/04/28