|
From: | Tim Bell |
Subject: | Re: [Qemu-discuss] Puzzling performance comparison with KVM and Hyper-V |
Date: | Wed, 5 Aug 2015 18:53:23 +0000 |
Thanks for all the help to understand the potential optimisations.
To summarise, various tuning approaches have been described in
http://openstack-in-production.blogspot.fr/2015/08/kvm-and-hyper-v-comparison-for-high.html and child blogs. We’ve got pretty close to the Hyper-V performance using NUMA and pinning. Most of the settings are available in the OpenStack Kilo release which we’ll be installing in 2H 2015. It may be worth reflecting on some of the KVM defaults. As always, multiple workloads are needed to find sensible defaults. Whether these are set in OpenStack or KVM is a further question. Thanks for all your help and suggestions already provided towards making high throughput computing in High Energy Physics more efficient. Tim From: Tim Bell
We are running a compute intensive application on a variety of virtual machines at CERN (a subset of Spec 2006). We have found two puzzling results during this benchmarking and can’t find the root cause after significant effort. 1.
Large virtual machines on KVM (32 cores) show a much worse performance than smaller ones 2.
Hyper-V overhead is significantly less compared to KVM We have tuned the KSM configuration with EPT off and CPU pinning but the overheads remain significant. 4 VMs 8 cores: 2.5% overhead compared to bare metal 2 VMs 16 cores: 8.4% overhead compared to bare metal 1 VM 32 cores: 12.9% overhead compared to bare metal Running the same test using Hyper-V produced 4 VMs 8 cores: 0.8% overhead compared to bare metal 1 VM 32 cores: 3.3% overhead compared to bare metal Can anyone suggest how to tune KVM to get equivalent performance to Hyper-V ? Configuration Hardware is Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, SMT enabled, 2GB/core CentOS 7 KVM hypervisor with CentOS 6 guest Windows 2012 Hyper-V hypervisor with CentOS 6 guest Benchmark is HEPSpec, the c++ subset of Spec 2006 The benchmarks are run in parallel according the number of cores. Thus, the 1x32 test runs 32 copies of the benchmark in a single VM on the hypervisor. The 4x8 test runs 4 VMs on the same hypervisor, with each VM running 8 copies of the
benchmark simultaneously. |
[Prev in Thread] | Current Thread | [Next in Thread] |