[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Rethinking missed tick catchup
From: |
Stefan Weil |
Subject: |
Re: [Qemu-devel] Rethinking missed tick catchup |
Date: |
Wed, 12 Sep 2012 18:27:14 +0200 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20120724 Iceowl/1.0b1 Icedove/3.0.11 |
Am 12.09.2012 15:54, schrieb Anthony Liguori:
Hi,
We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use. Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.
Here are the problems we're seeing:
1) Rapid reinjection can lead to time moving faster for short bursts of
time. We've seen a number of RTC watchdog BSoDs and it's possible
that at least one cause is reinjection speed.
2) When hibernating a host system, the guest gets is essentially paused
for a long period of time. This results in a very large tick catchup
while also resulting in a large skew in guest time.
I've gotten reports of the tick catchup consuming a lot of CPU time
from rapid delivery of interrupts (although I haven't reproduced this
yet).
3) Windows appears to have a service that periodically syncs the guest
time with the hardware clock. I've been told the resync period is an
hour. For large clock skews, this can compete with reinjection
resulting in a positive skew in time (the guest can be ahead of the
host).
Nearly each modern OS (including Windows) uses NTP
or some other protocol to get the time via a TCP network.
If a guest OS detects a small difference of time, it will usually
accelerate or decelerate the OS clock until the time is
synchronised again.
Large jumps in network time will make the OS time jump, too.
With a little bad luck, QEMU's reinjection will add the
positive skew, no matter whether the guest is Linux or Windows.
I've been thinking about an algorithm like this to address these
problems:
A) Limit the number of interrupts that we reinject to the equivalent of
a small period of wallclock time. Something like 60 seconds.
B) In the event of (A), trigger a notification in QEMU. This is easy
for the RTC but harder for the in-kernel PIT. Maybe it's a good time to
revisit usage of the in-kernel PIT?
C) On acculumated tick overflow, rely on using a qemu-ga command to
force a resync of the guest's time to the hardware wallclock time.
D) Whenever the guest reads the wallclock time from the RTC, reset all
accumulated ticks.
D) makes no sense, see my comment above.
Injection of additional timer interrupts should not be needed
after a hibernation. The guest must handle that situation
by reading either the hw clock (which must be updated
by QEMU when it resumes from hibernate) or by using
another time reference (like NTP, for example).
In order to do (C), we'll need to plumb qemu-ga through QMP. Mike and I
discussed a low-impact way of doing this (having a separate dispatch
path for guest agent commands) and I'm confident we could do this for
1.3.
This would mean that management tools would need to consume qemu-ga
through QMP. Not sure if this is a problem for anyone.
I'm not sure whether it's worth trying to support this with the
in-kernel PIT or not either.
Are there other issues with reinjection that people are aware of? Does
anything seem obviously wrong with the above?
Regards,
Anthony Liguori
Re: [Qemu-devel] Rethinking missed tick catchup, Gleb Natapov, 2012/09/12
Re: [Qemu-devel] Rethinking missed tick catchup,
Stefan Weil <=
Re: [Qemu-devel] Rethinking missed tick catchup, Michael Roth, 2012/09/12
Re: [Qemu-devel] Rethinking missed tick catchup, Luiz Capitulino, 2012/09/12
Re: [Qemu-devel] Rethinking missed tick catchup, Clemens Kolbitsch, 2012/09/12