qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RESEND][PATCH 0/3] Fix guest time drift under heavy lo


From: Dor Laor
Subject: Re: [Qemu-devel] [RESEND][PATCH 0/3] Fix guest time drift under heavy load.
Date: Sun, 09 Nov 2008 00:14:35 +0200
User-agent: Thunderbird 2.0.0.16 (X11/20080723)

Gleb Natapov wrote:
On Thu, Nov 06, 2008 at 09:37:56AM -0600, Anthony Liguori wrote:
  
Gleb Natapov wrote:
    
On Thu, Nov 06, 2008 at 08:40:09AM -0600, Anthony Liguori wrote:
  

      
Gleb: are you perhaps using a qcow2 file in conjunction with 
-snapshot?       
        
I am using qcow2, but without -snapshot.
  
      
Okay, you would still see this if your qcow2 is relatively small  
compared to the possible size it could be.

I totally believe that you could miss ticks from qcow2 metadata writing  
even with 100hz clock especially since we're using O_SYNC.  A relatively  
large write that has to extend the qcow2 file multiple times could  
conceivably block the guest for more than 10ms.  However, this is a bug  
in qcow2 IMHO.  Metadata updates should be done asynchronously and if  
they did, I bet this problem wouldn't occur.  A test against raw should  
confirm this.

    
I ran the copy test once again with qcow2 image, but this time I copied 
from qcow2 to network fs and the drift still exists. Much smaller
though. 8 second per hour AFAIR.

  
If part of qemu gets swapped out then all bets are off, and you can 
easily stall for significant fractions of a second. No amount of 
host high resolution time support will help you there.
        
          
Running a steady workload, you aren't going to be partially swapped.

    
        
We want to oversubscribe host as much as possible, and workload will
vary during a lifetime of the VMs.
  
      
I understand that we want guest time behave even when we're  
overcommitting the host CPU.

However, let's make sure we understand exactly what's going on such that  
we know precisely what we're fixing.  I believe the file copy benchmark  
is going to turn out to no longer produce drift with a raw image.  If  
that's the case, you'll need to find another benchmark to quantify drift.

    
Yes indeed. With raw image copy benchmark no longer runs enough time to
produce time drift big enough to be visible. So I ran this disk test
utility http://69.90.47.6/mybootdisks.com/mybootdisks_com/nu2/bst514.zip
for ~12 hours and the time drift was 12 secs (if I weren't so lazy and
wrote bat file to copy c:\windows in a loop I am sure result would be the
same). This is on completely idle host.


  
I think the best ones are going to be intense host workload (and let's  
see how much is needed before we start drifting badly) and high guest  
frequencies with hosts that lack high resolution timers.  I think with a  
high resolution guest and no host overcommit, it should be very  
difficult to produce drift regardless of what the guest is doing.

    
Later I'll try to generate load on a host an see how this affects
guest's time drift.

--
			Gleb.

  

No doubt that either additional cpu load or maybe easily, disk io load on the host/storage will increase
the time drift. Can't wait to get the load results by Gleb.
Note that the problem becomes more visible when the guests uses 1000hz timers (when windows
plays any multimedia or for Linux with 1000Hz clock + pit/rtc.
There are many hosts out there with kernel < 2.6.24 --> no userspace hr timer.
Bottom line, no matter how we'll twist it there will be scenarios where we drift.

Maybe in order to minimize the effect we can use the in-kernel-pit for kvm. For rtc we can
use Andrzej's solution using RTC_REG_C. If we use a counter to count the number of
un/set events it can solve the drift for rtc without any irq_set api changes.

Another option is to provide the qemu_timer user a simple api to check if the guest vcpu was
scheduled by the host since the it was issued. If the vcpu did not the timer user can assume a drift.
(Less accurate from the irq solution thought).
The advantage here is that it helps the long forgotten -win2k-hack to work.
I almost forgot this hack is not stable and when you run 10+ VMs, some fail. Using this option
the hack will reliably force the host/guest to do dma_request---vcpu_exec---dma_done.
Now it only separate the two dma events with a timer.

Thanks for the fruitful discussion,
Dor

reply via email to

[Prev in Thread] Current Thread [Next in Thread]