[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to t
From: |
Paul Boven |
Subject: |
[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump |
Date: |
Fri, 01 Aug 2014 18:20:15 -0000 |
As another test (still running qemu-git-2.1.0-rc2-git-20140721), I
disabled NTP on the two servers (and rebooted them), but left it running
on the guest.
When doing the migration, server a (where the guest was running) had an
NTP offset of -3.037619 s, and server b was at -3.337718 s. The guest
was nicely synchronized before the migration, but afterwards had a clock
offset of 0.349590 s, which roughly corresponds to the difference in
offsets. The small NTP offset on the guest after the migration implies
that it did briefly freeze, but too short to notice. I'll leave it
running for longer to be able to confirm this with sufficient accuracy.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
Status in QEMU:
New
Status in “glusterfs” package in Ubuntu:
Invalid
Status in “libvirt” package in Ubuntu:
Triaged
Bug description:
We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing
a Gluster filesystem. Guests can be live migrated between them.
However, live migration often leads to the guest being stuck at 100%
for a while. In that case, the dmesg output for such a guest will show
(once it recovers): Clocksource tsc unstable (delta = 662463064082
ns). In this particular example, a guest was migrated and only after
11 minutes (662 seconds) did it become responsive again.
It seems that newly booted guests doe not suffer from this problem,
these can be migrated back and forth at will. After a day or so, the
problem becomes apparent. It also seems that migrating from server A
to server B causes much more problems than going from B back to A. If
necessary, I can do more measurements to qualify these observations.
The VM servers run Ubuntu 13.04 with these packages:
Kernel: 3.8.0-35-generic x86_64
Libvirt: 1.0.2
Qemu: 1.4.0
Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using
libgfapi yet as the Ubuntu libvirt is not linked against libgfapi).
The interconnect between both machines (both for migration and gluster) is
10GbE.
Both servers are synced to NTP and well within 1ms form one another.
Guests are either Ubuntu 13.04 or 13.10.
On the guests, the current_clocksource is kvm-clock.
The XML definition of the guests only contains: <clock offset='utc'/>
Now as far as I've read in the documentation of kvm-clock, it specifically
supports live migrations, so I'm a bit surprised at these problems. There isn't
all that much information to find on these issue, although I have found
postings by others that seem to have run into the same issues, but without a
solution.
---
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
DistroRelease: Ubuntu 14.04
Package: libvirt (not installed)
ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic
root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8
BOOTIF=01-00-25-90-75-b5-c8
ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
Tags: trusty apparmor apparmor apparmor apparmor apparmor
Uname: Linux 3.13.0-24-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
modified.conffile..etc.default.libvirt.bin: [modified]
modified.conffile..etc.libvirt.libvirtd.conf: [modified]
modified.conffile..etc.libvirt.qemu.conf: [modified]
modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted]
mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662
mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837
mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
- [Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump,
Paul Boven <=