|
From: | Hailiang Zhang |
Subject: | Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) |
Date: | Wed, 26 Oct 2016 23:52:48 +0800 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 |
Hi Amit, On 2016/10/26 16:26, Amit Shah wrote:
On (Wed) 26 Oct 2016 [14:43:30], Hailiang Zhang wrote:Hi Amit, On 2016/10/26 14:09, Amit Shah wrote:Hello, On (Tue) 18 Oct 2016 [20:09:56], zhanghailiang wrote:This is the 21th version of COLO frame series. Rebase to the latest master.I've reviewed the patchset, have some minor comments, but overall it looks good. The changes are contained, and common code / existing code paths are not affected much. We can still target to merge this for 2.8.I really appreciate your help ;), I will fix all the issues later and send v22. Hope we can still catch the deadline of V2.8.Do you have any tests on how much the VM slows down / downtime incurred during checkpoints?Yes, we tested that long time ago, it all depends. The downtime is determined by the time of transferring the dirty pages and the time of flushing ram from ram buffer. But we really have methods to reduce the downtime. One method is to reduce the amount of data (dirty pages mainly) while do checkpoint by transferring dirty pages asynchronously while PVM and SVM are running (no in the time of doing checkpoint). Besides we can re-use the capability of migration, such as compressing, etc. Another method is to reduce the time of flushing ram by using userfaultfd API to convert copying ram into marking bitmap. We can also flushing the ram buffer by multiple threads which advised by Dave ...Yes, I understand that as with any migration numbers, this too depends on what the guest is doing. However, can you just pick some standard workload - kernel compile or something like that - and post a few observations?
Li Zhijian has sent some test results which based on kernel colo proxy, After switch to userspace colo proxy, there maybe some degradations. But for the old scenario, some optimizations are not implemented. For the new userspace colo proxy scenario, we didn't test it overall, Because it is still WIP, we will start the work after this frame is merged.
Also, can you tell how did you arrive at the default checkpoint interval?Er, for this value, we referred to Remus in XEN platform. ;) But after we implement COLO with colo proxy, this interval value will be changed to a bigger one (10s). And we will make it configuration too. Besides, we will add another configurable value to control the min interval of checkpointing.OK - any typical value that is a good mix between COLO keeping the network too busy / guest paused vs guest making progress? Again this is something that's workload-dependent, but I guess you have typical numbers from a network-bound workload?
Yes, you can refer to Zhijian's email for detail. I think it is necessary to add some test/performance results into COLO's wiki. We will do that later. Thanks, hailiang
Thanks, Amit .
[Prev in Thread] | Current Thread | [Next in Thread] |