[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG |
Date: |
Fri, 12 Aug 2016 09:02:07 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.1.4 |
Alex Bennée <address@hidden> writes:
> Alex Bennée <address@hidden> writes:
>
>> This is the fourth iteration of the RFC patch set which aims to
>> provide the basic framework for MTTCG. I hope this will provide a good
>> base for discussion at KVM Forum later this month.
>>
> <snip>
>>
>> In practice the memory barrier problems don't show up with an x86
>> host. In fact I have created a tree which merges in the Emilio's
>> cmpxchg atomics which happily boots ARMv7 Debian systems without any
>> additional changes. You can find that at:
>>
>>
>> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>>
> <snip>
>> Performance
>> ===========
>>
>> You can't do full work-load testing on this tree due to the lack of
>> atomic support (but I will run some numbers on
>> mttcg/base-patches-v4-with-cmpxchg-atomics-v2).
>
> So here is a more real world work load run:
>
> retry.py called with
> ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine',
> 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu',
> 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio',
> '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device',
> 'virtio-net-device,netdev=unet', '-drive',
> 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
> '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0
> systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel',
> '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp',
> '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single']
> run 1: ret=0 (PASS), time=261.794911 (1/1)
> run 2: ret=0 (PASS), time=257.290045 (2/2)
> run 3: ret=0 (PASS), time=256.536991 (3/3)
> run 4: ret=0 (PASS), time=254.036260 (4/4)
> run 5: ret=0 (PASS), time=256.539165 (5/5)
> Results summary:
> 0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation)
> Ran command 5 times, 5 passes
>
> retry.py called with
> ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine',
> 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu',
> 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio',
> '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device',
> 'virtio-net-device,netdev=unet', '-drive',
> 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
> '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0
> systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel',
> '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp',
> '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi']
> run 1: ret=0 (PASS), time=86.597459 (1/1)
> run 2: ret=0 (PASS), time=82.843904 (2/2)
> run 3: ret=0 (PASS), time=84.095910 (3/3)
> run 4: ret=0 (PASS), time=83.844595 (4/4)
> run 5: ret=0 (PASS), time=83.594768 (5/5)
> Results summary:
> 0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation)
> Ran command 5 times, 5 passes
>
> This shows a 30% overhead over the ideal for running multi-threaded but
> still seeing a decent improvement in wall time.
>
> So the test itself is booting the system, running the
> benchmark-build.service:
>
> # A benchmark target
> #
> # This shutsdown once the boot has completed
>
> [Unit]
> Description=Default
> Requires=basic.target
> After=basic.target
> AllowIsolate=yes
>
> [Service]
> Type=oneshot
> ExecStart=/root/mysrc/testcases.git/build-dir.sh
> /root/src/stress-ng.git/
> ExecStartPost=/sbin/poweroff
>
> [Install]
> WantedBy=multi-user.target
>
> And the build-dir script is a simple:
>
> #!/bin/sh
> #
> NR_CPUS=$(grep -c ^processor /proc/cpuinfo)
> set -e
> cd $1
> make clean
> make -j${NR_CPUS}
> cd -
>
> Measuring this over increasing -smp
Measuring this over increasing -smp
-smp time -smp 1 / smp time as bar x faster
-----------------------------------------------------
1 238.184 238.184 WWWWWWWWWWWW 1.000
2 133.402 119.092 WWWWWWh 1.785
3 99.531 79.395 WWWWW 2.393
4 82.760 59.546 WWWW. 2.878
5 82.513 47.637 WWWW. 2.887
6 78.922 39.697 WWWH 3.018
7 87.181 34.026 WWWW; 2.732
8 87.098 29.773 WWWW; 2.735
So a more complete analysis shows the benefits start to tail off as we
push past 4 vCPUs. However on my machine which is 4+4 hyperthreads that
could be just as much a feature of the host system. Indeed the results
start getting noisy at 7/8 vCPUs.
Interestingly a perf run against -smp 6 shows gic_update topping the
graph (3.14% of total execution time). That function does have a big
TODO for optimisation on it ;-)
--
Alex Bennée
- [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU, (continued)
- [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work., Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx safe for MTTCG, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG, Alex Bennée, 2016/08/11
- Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, Alex Bennée, 2016/08/11
- Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG,
Alex Bennée <=
- Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, G 3, 2016/08/11