qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Rethinking missed tick catchup


From: Michael Roth
Subject: Re: [Qemu-devel] Rethinking missed tick catchup
Date: Wed, 12 Sep 2012 15:06:21 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Sep 12, 2012 at 07:30:08PM +0200, Stefan Weil wrote:
> Am 12.09.2012 18:45, schrieb Gleb Natapov:
> >On Wed, Sep 12, 2012 at 06:27:14PM +0200, Stefan Weil wrote:
> >>Am 12.09.2012 15:54, schrieb Anthony Liguori:
> >>>Hi,
> >>>
> >>>We've been running into a lot of problems lately with Windows guests and
> >>>I think they all ultimately could be addressed by revisiting the missed
> >>>tick catchup algorithms that we use.  Mike and I spent a while talking
> >>>about it yesterday and I wanted to take the discussion to the list to
> >>>get some additional input.
> >>>
> >>>Here are the problems we're seeing:
> >>>
> >>>1) Rapid reinjection can lead to time moving faster for short bursts of
> >>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>>    that at least one cause is reinjection speed.
> >>>
> >>>2) When hibernating a host system, the guest gets is essentially paused
> >>>    for a long period of time.  This results in a very large tick catchup
> >>>    while also resulting in a large skew in guest time.
> >>>
> >>>    I've gotten reports of the tick catchup consuming a lot of CPU time
> >>>    from rapid delivery of interrupts (although I haven't reproduced this
> >>>    yet).
> >>>
> >>>3) Windows appears to have a service that periodically syncs the guest
> >>>    time with the hardware clock.  I've been told the resync period is an
> >>>    hour.  For large clock skews, this can compete with reinjection
> >>>    resulting in a positive skew in time (the guest can be ahead of the
> >>>    host).
> >>Nearly each modern OS (including Windows) uses NTP
> >>or some other protocol to get the time via a TCP network.
> >>
> >The drifts we are talking about will take ages for NTP to fix.
> >
> >>If a guest OS detects a small difference of time, it will usually
> >>accelerate or decelerate the OS clock until the time is
> >>synchronised again.
> >>
> >>Large jumps in network time will make the OS time jump, too.
> >>With a little bad luck, QEMU's reinjection will add the
> >>positive skew, no matter whether the guest is Linux or Windows.
> >>
> >As far as I know NTP will never make OS clock jump. The purpose of NTP
> >is to fix time gradually, so apps will not notice. npdate is used to
> >force clock synchronization, but is should be run manually.
> 
> s/npdate/ntpdate. Yes, some Linux distros run it at system start,
> and it's also usual to call it every hour (poor man's NTP, uses
> less resources).

Windows at least seems to generally default to a max correction of += 15
hours using this approach. The relevant registry values are listed here:

http://support.microsoft.com/kb/816042#method4 (under More Information)

On my Win7 instance I have:

MaxPosPhaseCorrection: 54000 (15 hours)
MaxNegPhaseCorrection: 54000 (15 hours)

So there are definitely situations where guests won't correct themselves
even with NTP or ntpdate-like services running.

Also:

MaxAllowedPhaseOffset: 1 (1 second)

So Windows won't attempt to "catch-up" via increased tickrate if the
delta is greater than 1 second, and will instead try to reset the clock
directly. Which is basically the policy we're looking to implement,
except from the host-side.


> 
> >
> >>>I've been thinking about an algorithm like this to address these
> >>>problems:
> >>>
> >>>A) Limit the number of interrupts that we reinject to the equivalent of
> >>>    a small period of wallclock time.  Something like 60 seconds.
> >>>
> >>>B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time 
> >>> to
> >>>    revisit usage of the in-kernel PIT?
> >>>
> >>>C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>>    force a resync of the guest's time to the hardware wallclock time.
> >>>
> >>>D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>>    accumulated ticks.
> >>D) makes no sense, see my comment above.
> >>
> >>Injection of additional timer interrupts should not be needed
> >>after a hibernation. The guest must handle that situation
> >>by reading either the hw clock (which must be updated
> >>by QEMU when it resumes from hibernate) or by using
> >>another time reference (like NTP, for example).
> >>
> >He is talking about host hibernation, not guest.
> >
> 
> I also meant host hibernation.
> 
> Maybe the host should tell the guest that it is going to
> hibernate (ACPI event), then the guest can use its
> normal hibernate entry and recovery code, too.
> 

I think do that would be useful either way, but aren't there other
scenarios where big time jumps can occur? What about live migration?
Presumably we'd complete within the 15 hour limit above, but for other
operating systems or particular configurations thereof we might still
fall outside the threshold they're willing to correct for. At least with
an approach like this we can clearly define the requirements for proper
time-keeping.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]