qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] CPUID feature bits not saved with migration


From: Andre Przywara
Subject: Re: [Qemu-devel] CPUID feature bits not saved with migration
Date: Wed, 22 Jul 2009 15:24:51 +0200
User-agent: Thunderbird 2.0.0.14 (X11/20080508)

(Sorry for the late reply, I had some mail troubles)

Jamie Lokier wrote:
Andre Przywara wrote:
Jamie Lokier wrote:
Anthony Liguori wrote:
It's unclear what to do about -cpu host. If we did migrate cpuid values, then -cpu would effectively be ignored after an incoming migration.
The new host might not support all the cpuid features of the old host,
whether by -cpu host or explicit cpuid.  What happens then?
If you plan to migrate, you should think of this in advance.

In my experience so far, for small sites, you don't plan migration for
2-3 years after starting the VM, because it's only then you realise
your host hardware is quite old, you buy a replacement to consolidate,
and you are still running the VM that you didn't know would still be
mission critical years later.
That is one use-case for live migration. Another would be a migration
pool with lots of machines each running some VMs. If one host is loaded,
you can migrate to a lesser loaded one. Think of a hoster or cloud like
environment.


At least, that's been my experience so far.  I've "cold migrated" a
few VMs, and in some cases from non-virtual machines to VMs.  None of
these could be planned when the guest was first installed, especially
the ones where it wasn't realised the guest would outlive the host
hardware.
Fortunately it seems like that newer CPUs only _add_ CPUID bits, so this
should not be a problem.


I have to say, unfortunately hot migration has never been an option
because the version of KVM running on the new host is invariably
incompatible with the KVM running on the old host.
So far I have only seen problems like this if the target host KVM
version is older than the source one. Some of these issues could be
overcome by putting a translator application between source and target,
but I am not sure whether the effort is worth the results.
What kind of issues do you see? Are you migrating from newer KVMs to
older ones?

I have a rough version of a tool to compute the least common
denominator CPUID bits given either processor (code)names or host
names. In the latter case it will login into the box and query the
host's CPUID. The tool then generates a QEMU command line (like -cpu
qemu64,-mwait,-popcnt) with which the guest should be started.  This
should ensure that the guest always sees the same subset of the CPUs
capabilities.

I wonder how that would be useful.  Don't you usually migrate only
when you've acquired new hardware, who's specs you don't know at the
time you'd want to compute the initial CPUID?
See above, this tool helps to enlarge the migration pool by computing
the best possible bit set of CPU features.

For changing cpuid when migrating, as you might like to do with -cpu
host for performance, is reboot-during-migrate useful?  It would make
sure all disk state is committed to the image files asynchronously
while the machine continues to run (just like normal migration), and
at the last moment transfers control and the machine sees a reboot,
permitting devices changes including cpuid change.
Is that really useful? After all the sexy part of live migration is the "live" component...

The other sexy part is being easy and asynchronous: not stopping the
guest for a long time during the migration.

Easy: Now, at the moment you have to give all the right guest
configuration on the destination command line, so I take your point.

But if guest configuration is ever included in the saved state for
migration, migration will really easy.  I hope it's just as easy to do
"cold migration".
Agreed. We should have a savevm section transferring the guest config.

Async: Do we save RAM state across reboots normally?  I know of OSes
which expect at least some part of RAM to survive reboots, so killing
the VM and restarting on another host would change the behaviour,
compared with rebooting locally; that's not transparent migration,
it's a subtle, unexpected behaviour change.  Unfortunately doing the
right thing involves savev, which pauses the guests for a long time.
The pause comes from saving and loading RAM, something which migration
handles well.
Have you seen any real life problems with this? What are these OSes?

There's also the small matter of migration having a totally different
interface compared with savevm right now, with savevm requiring a
dummy qcow2 disk while migration transfers control across a network
with no temporary file.
You are right, that is pretty unfortunate. I worked around this limitation by using the exec: prefix with migrate to let a shell script dump the migration stream to disk, with the same trick you can reload the state again. That worked pretty well for me in the past.

Guess which one is nicer for the user wanting "move my VM to host FOO
(which doesn't support SSE4) with minimal downtime".

CPU hotplug could be used for cpuid change in theory, but I doubt if
any guests or guest apps would handle it well.
Hotplugging could work for secondary processors, but hotplugging the BSP is kind of tricky. And this does not solve the userspace issues, where libraries detect CPU capabilities during startup and use optimized code paths. AFAIK there is no mechanism of informing those libraries about a CPUID change.

I agree, and it's pointless to spend much time discussion hotplug for this.
Most guests wouldn't handle CPUs with mixed CPUIDs anyway.

In theory, sometimes it'd be ok to push that problem to the user: they
can stop and start specific apps under user control without bringing
down a whole machine, and most apps don't use cpuid-dependent
features, especially on servers.
But libraries do, even on servers.

Btw, why can't hotplugging the main processor work?  For (real)
high-reliability systems, all processors are hotpluggable, afaik.
Yes, but if you look into the Linux code, you will find many assumptions about the BSP not being hotpluggable. For instance you cannot offline CPU0 (take a look at /sys/devices/system/cpu/cpu0, there is no online, in opposite to all other cpu<n>). I haven't looked in detail, but I assume that there is no real showstopper, it is just the current code design that makes offlining CPU0 hard. BTW, do you know of any x86 machines which really allow physical CPU hotplugging?

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632





reply via email to

[Prev in Thread] Current Thread [Next in Thread]