qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC] Another Para-Virtualization page recycler. Empty Gues


From: XaviLi
Subject: [Qemu-devel] [RFC] Another Para-Virtualization page recycler. Empty Guest OS free pages every few seconds
Date: Wed, 20 Sep 2017 11:51:32 +0800

PPR (Per Page Recycler) is a para virtualization driver currently available for 
KVM hosts and Linux/Windows guests. With PPR, every page freed to Guest OS can 
be recycled in seconds by hypervisor. Therefore, VMs can dynamical 
allocate/free pages from hypervisor according to application’s request. This 
issue can result in a memory density boost in many cases with little CPU cost. 
PPR is a component of an open source solution named Dynamic VM:
https://github.com/baibantech/dynamic_vm.git

patch can be found here: 
https://github.com/baibantech/dynamic_vm/tree/master/dynamic_vm_0.5

The guest side is very simple. PPR have a hook at the entry of page-free 
interface of Guest OS to mark every freed pages. Marked pages can be recycled 
and replaced by read-only zero page in one scan period. If the page was 
allocated again before the recycling happens, the mark was simply canceled by 
PPR’s page-allocation interface hook.

The guest driver works in a per-page way other than VirtIO balloon’s batch 
mode. The recycling issue is based on a lockless operation. The guest and host 
does not send any interrupt to each other for this.

We ran a test on our machine. Host use about 6500+ CPU cycles averagely for 
recycling each page. In which 3900 cycles is cost by the Linux kernel 
page-table operation. The guest hook cost about 300 cycles for mark or unmark 
each page. It is cheap enough in many cases. The daemon cost 5% of one CPU when 
the scanning found nothing to recycle. Other threads only move when there is 
work to do.

Saving memory often leads to a CPU-memory tradeoff topic. Here thing is 
different. In memory intensive usage we often see page-deduplication applied. 
It is very much cheaper to recycle a page than dedup it. In that case we can 
use PPR to win CPU and memory both.

We have a test to demonstrate how density was boosted. In which PPR works with 
a multithread page deduplication engine named PageONE rather than KSM. PageONE 
is going to be discussed in another topic.

Test environment
DELL R730
Intel® Xeon® E5-2650 v4 (2.20 GHz, of Cores 12, threads 24); 
256GB RAM
Host OS: Ubuntu server 14.04 Host kernel: 4.4.1
Qemu: 2.9.0
Guest OS: Ubuntu server 16.04 Guest kernel: 4.4.76

Test VMs’ memory size was configured to be 4GB. Each VM ran an application 
allocate/free memory every 30 minutes. The top memory allocation size of each 
period is 3GB and average size is 150MB. We ran many such VMs in the same time 
same machine to see the total host-side memory cost.


Raw KVM
Result: 67 VMs used 230GB memory. 

KVM + KSM 
Configuration: sleep_millisecs = 0, pages_to_scan = 1000000
Result: 73 VMs used 230GB memory. KSM daemon cost 100% of one CPU

KVM + PPR + PageONE:
Configuration: 
work threads =12 reclaim_period = 10 secs merge_period = 600 secs 
(Which means freed pages was recycled each 10 secs, pages were merged if not 
modified in 2 merge_periods)
Result: 516 VMs run together. The total used memory goes up and down between 
218GB and 139GB, averagely 182GB. 

PPR and PageONE shared a daemon thread and 12 work threads. We test their CPU 
cost together because the two features are sharing threads while offloading for 
each other. 

The average CPU cost of work threads is 6.1%. Daemon thread’s cost is 22.1%

Here we tried other combinations in the same test case:
reclaim_period(sec) merge_period(sec) average memory(GB) work threads' average 
CPU daemon's CPU
5 200 164 6.8% 34%
5 600 176 6.3% 35%
10 200 171 6.6% 22%
10 600 182 6.1% 22%
NOTE: It is a high pressure test. PPR was recycling 3.2TB memory per hour. In 
normal case PPR costs far less CPU than that.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]