[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v6 01/50] tcg: Merge opcode arguments into TCGOp
From: |
Emilio G. Cota |
Subject: |
Re: [Qemu-devel] [PATCH v6 01/50] tcg: Merge opcode arguments into TCGOp |
Date: |
Tue, 17 Oct 2017 16:04:51 -0400 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Mon, Oct 16, 2017 at 10:25:20 -0700, Richard Henderson wrote:
> From: Richard Henderson <address@hidden>
>
> Rather than have a separate buffer of 10*max_ops entries,
> give each opcode 10 entries. The result is actually a bit
> smaller and should have slightly more cache locality.
>
> Signed-off-by: Richard Henderson <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
This gives a small yet measurable perf advantage when booting linux:
Performance counter stats for 'taskset -c 0
aarch64-softmmu/qemu-system-aarch64 \
-M virt,gic_version=3 -cpu cortex-a57 -nographic -m 4096 -netdev \
user,id=unet,hostfwd=tcp::2222-:22 -device
virtio-net-device,netdev=unet \
-drive file=jessie-arm64-die-on-boot.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock -kernel \
aarch64-current-linux-kernel-only.img \
-append console=ttyAMA0 root=/dev/vda1 -smp 1' (10 runs):
Before:
7182.556704 task-clock (msec) # 0.999 CPUs utilized
( +- 0.11% )
21,710 context-switches # 0.003 M/sec
( +- 0.12% )
1 cpu-migrations # 0.000 K/sec
( +- 11.11% )
7,929 page-faults # 0.001 M/sec
( +- 1.75% )
30,280,536,799 cycles # 4.216 GHz
( +- 0.11% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
54,481,515,301 instructions # 1.80 insns per cycle
( +- 0.09% )
9,655,822,880 branches # 1344.343 M/sec
( +- 0.10% )
170,594,899 branch-misses # 1.77% of all branches
( +- 0.10% )
7.190274755 seconds time elapsed
( +- 0.11% )
After:
7086.254881 task-clock (msec) # 0.999 CPUs utilized
( +- 0.13% )
21,598 context-switches # 0.003 M/sec
( +- 0.07% )
1 cpu-migrations # 0.000 K/sec
8,099 page-faults # 0.001 M/sec
( +- 0.97% )
29,856,727,544 cycles # 4.213 GHz
( +- 0.12% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
53,585,205,542 instructions # 1.79 insns per cycle
( +- 0.10% )
9,638,601,205 branches # 1360.183 M/sec
( +- 0.10% )
169,785,181 branch-misses # 1.76% of all branches
( +- 0.08% )
7.094560954 seconds time elapsed
That is, a 1.33% perf improvement.
Emilio
- [Qemu-devel] [PATCH v6 00/50] tcg tb_lock removal, Richard Henderson, 2017/10/16
- [Qemu-devel] [PATCH v6 01/50] tcg: Merge opcode arguments into TCGOp, Richard Henderson, 2017/10/16
- Re: [Qemu-devel] [PATCH v6 01/50] tcg: Merge opcode arguments into TCGOp,
Emilio G. Cota <=
- [Qemu-devel] [PATCH v6 04/50] tcg: Propagate TCGOp down to allocators, Richard Henderson, 2017/10/16
- [Qemu-devel] [PATCH v6 03/50] tcg: Propagate args to op->args in tcg.c, Richard Henderson, 2017/10/16
- [Qemu-devel] [PATCH v6 02/50] tcg: Propagate args to op->args in optimizer, Richard Henderson, 2017/10/16
- [Qemu-devel] [PATCH v6 05/50] tcg: Introduce arg_temp, Richard Henderson, 2017/10/16
- [Qemu-devel] [PATCH v6 06/50] tcg: Add temp_global bit to TCGTemp, Richard Henderson, 2017/10/16