Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG

From:	Alex Bennée
Subject:	Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
Date:	Fri, 12 Aug 2016 09:02:07 +0100
User-agent:	mu4e 0.9.17; emacs 25.1.4

Alex Bennée <address@hidden> writes:

> Alex Bennée <address@hidden> writes:
>
>> This is the fourth iteration of the RFC patch set which aims to
>> provide the basic framework for MTTCG. I hope this will provide a good
>> base for discussion at KVM Forum later this month.
>>
> <snip>
>>
>> In practice the memory barrier problems don't show up with an x86
>> host. In fact I have created a tree which merges in the Emilio's
>> cmpxchg atomics which happily boots ARMv7 Debian systems without any
>> additional changes. You can find that at:
>>
>>   
>> https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
>>
> <snip>
>> Performance
>> ===========
>>
>> You can't do full work-load testing on this tree due to the lack of
>> atomic support (but I will run some numbers on
>> mttcg/base-patches-v4-with-cmpxchg-atomics-v2).
>
> So here is a more real world work load run:
>
>   retry.py called with 
> ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 
> 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 
> 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', 
> '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 
> 'virtio-net-device,netdev=unet', '-drive', 
> 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
>  '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 
> systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', 
> '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', 
> '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=single']
>   run 1: ret=0 (PASS), time=261.794911 (1/1)
>   run 2: ret=0 (PASS), time=257.290045 (2/2)
>   run 3: ret=0 (PASS), time=256.536991 (3/3)
>   run 4: ret=0 (PASS), time=254.036260 (4/4)
>   run 5: ret=0 (PASS), time=256.539165 (5/5)
>   Results summary:
>   0: 5 times (100.00%), avg time 257.239 (8.00 varience/2.83 deviation)
>   Ran command 5 times, 5 passes
>
>   retry.py called with 
> ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 
> 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 
> 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', 
> '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 
> 'virtio-net-device,netdev=unet', '-drive', 
> 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none',
>  '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 
> systemd.unit=benchmark-build.service root=/dev/vda1', '-kernel', 
> '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', 
> '4', '-name', 'debug-threads=on', '-accel', 'tcg,thread=multi']
>   run 1: ret=0 (PASS), time=86.597459 (1/1)
>   run 2: ret=0 (PASS), time=82.843904 (2/2)
>   run 3: ret=0 (PASS), time=84.095910 (3/3)
>   run 4: ret=0 (PASS), time=83.844595 (4/4)
>   run 5: ret=0 (PASS), time=83.594768 (5/5)
>   Results summary:
>   0: 5 times (100.00%), avg time 84.195 (2.02 varience/1.42 deviation)
>   Ran command 5 times, 5 passes
>
> This shows a 30% overhead over the ideal for running multi-threaded but
> still seeing a decent improvement in wall time.
>
> So the test itself is booting the system, running the
> benchmark-build.service:
>
>   # A benchmark target
>   #
>   # This shutsdown once the boot has completed
>
>   [Unit]
>   Description=Default
>   Requires=basic.target
>   After=basic.target
>   AllowIsolate=yes
>
>   [Service]
>   Type=oneshot
>   ExecStart=/root/mysrc/testcases.git/build-dir.sh
>   /root/src/stress-ng.git/
>   ExecStartPost=/sbin/poweroff
>
>   [Install]
>   WantedBy=multi-user.target
>
> And the build-dir script is a simple:
>
>     #!/bin/sh
>     #
>     NR_CPUS=$(grep -c ^processor /proc/cpuinfo)
>     set -e
>     cd $1
>     make clean
>     make -j${NR_CPUS}
>     cd -
>
> Measuring this over increasing -smp


Measuring this over increasing -smp

 -smp     time  -smp 1 / smp  time as bar   x faster
-----------------------------------------------------
    1  238.184       238.184  WWWWWWWWWWWW     1.000
    2  133.402       119.092  WWWWWWh          1.785
    3   99.531        79.395  WWWWW            2.393
    4   82.760        59.546  WWWW.            2.878
    5   82.513        47.637  WWWW.            2.887
    6   78.922        39.697  WWWH             3.018
    7   87.181        34.026  WWWW;            2.732
    8   87.098        29.773  WWWW;            2.735

So a more complete analysis shows the benefits start to tail off as we
push past 4 vCPUs. However on my machine which is 4+4 hyperthreads that
could be just as much a feature of the host system. Indeed the results
start getting noisy at 7/8 vCPUs.

Interestingly a perf run against -smp 6 shows gic_update topping the
graph (3.14% of total execution time). That function does have a big
TODO for optimisation on it ;-)

--
Alex Bennée

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU, (continued)
- [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 22/28] tcg: enable thread-per-vCPU, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 25/28] cputlb: introduce tlb_flush_* async work., Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 23/28] atomic: introduce cmpxchg_bool, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 24/28] cputlb: add assert_cpu_is_self checks, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 26/28] cputlb: tweak qemu_ram_addr_from_host_nofail reporting, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 28/28] cputlb: make tlb_flush_by_mmuidx safe for MTTCG, Alex Bennée, 2016/08/11
- [Qemu-devel] [RFC v4 27/28] cputlb: make tlb_reset_dirty safe for MTTCG, Alex Bennée, 2016/08/11
- Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, Alex Bennée, 2016/08/11
  - Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, Alex Bennée <=
- Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, G 3, 2016/08/11
  - Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, Alex Bennée, 2016/08/12
    - Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, G 3, 2016/08/12
    - Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG, Alex Bennée, 2016/08/12

Prev by Date: Re: [Qemu-devel] [PATCH 5/5] target-arm: Fix warn about implicit conversion
Next by Date: [Qemu-devel] [PATCH RFC] tests: Run qtest cases in parallel
Previous by thread: Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
Next by thread: Re: [Qemu-devel] [RFC v4 00/28] Base enabling patches for MTTCG
Index(es):
- Date
- Thread