|
From: | Dor Laor |
Subject: | Re: [Qemu-devel] [RESEND][PATCH 0/3] Fix guest time drift under heavy load. |
Date: | Sun, 09 Nov 2008 00:14:35 +0200 |
User-agent: | Thunderbird 2.0.0.16 (X11/20080723) |
Gleb Natapov wrote:
On Thu, Nov 06, 2008 at 09:37:56AM -0600, Anthony Liguori wrote:Gleb Natapov wrote:On Thu, Nov 06, 2008 at 08:40:09AM -0600, Anthony Liguori wrote:Gleb: are you perhaps using a qcow2 file in conjunction with -snapshot?I am using qcow2, but without -snapshot.Okay, you would still see this if your qcow2 is relatively small compared to the possible size it could be. I totally believe that you could miss ticks from qcow2 metadata writing even with 100hz clock especially since we're using O_SYNC. A relatively large write that has to extend the qcow2 file multiple times could conceivably block the guest for more than 10ms. However, this is a bug in qcow2 IMHO. Metadata updates should be done asynchronously and if they did, I bet this problem wouldn't occur. A test against raw should confirm this.I ran the copy test once again with qcow2 image, but this time I copied from qcow2 to network fs and the drift still exists. Much smaller though. 8 second per hour AFAIR.If part of qemu gets swapped out then all bets are off, and you can easily stall for significant fractions of a second. No amount of host high resolution time support will help you there.Running a steady workload, you aren't going to be partially swapped.We want to oversubscribe host as much as possible, and workload will vary during a lifetime of the VMs.I understand that we want guest time behave even when we're overcommitting the host CPU. However, let's make sure we understand exactly what's going on such that we know precisely what we're fixing. I believe the file copy benchmark is going to turn out to no longer produce drift with a raw image. If that's the case, you'll need to find another benchmark to quantify drift.Yes indeed. With raw image copy benchmark no longer runs enough time to produce time drift big enough to be visible. So I ran this disk test utility http://69.90.47.6/mybootdisks.com/mybootdisks_com/nu2/bst514.zip for ~12 hours and the time drift was 12 secs (if I weren't so lazy and wrote bat file to copy c:\windows in a loop I am sure result would be the same). This is on completely idle host.I think the best ones are going to be intense host workload (and let's see how much is needed before we start drifting badly) and high guest frequencies with hosts that lack high resolution timers. I think with a high resolution guest and no host overcommit, it should be very difficult to produce drift regardless of what the guest is doing.Later I'll try to generate load on a host an see how this affects guest's time drift. -- Gleb. No doubt that either additional cpu load or maybe easily, disk io load on the host/storage will increase the time drift. Can't wait to get the load results by Gleb. Note that the problem becomes more visible when the guests uses 1000hz timers (when windows plays any multimedia or for Linux with 1000Hz clock + pit/rtc. There are many hosts out there with kernel < 2.6.24 --> no userspace hr timer. Bottom line, no matter how we'll twist it there will be scenarios where we drift. Maybe in order to minimize the effect we can use the in-kernel-pit for kvm. For rtc we can use Andrzej's solution using RTC_REG_C. If we use a counter to count the number of un/set events it can solve the drift for rtc without any irq_set api changes. Another option is to provide the qemu_timer user a simple api to check if the guest vcpu was scheduled by the host since the it was issued. If the vcpu did not the timer user can assume a drift. (Less accurate from the irq solution thought). The advantage here is that it helps the long forgotten -win2k-hack to work. I almost forgot this hack is not stable and when you run 10+ VMs, some fail. Using this option the hack will reliably force the host/guest to do dma_request---vcpu_exec---dma_done. Now it only separate the two dma events with a timer. Thanks for the fruitful discussion, Dor |
[Prev in Thread] | Current Thread | [Next in Thread] |