qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?


From: Liran Alon
Subject: Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
Date: Thu, 6 Jun 2019 13:09:56 +0300


> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <address@hidden> wrote:
> 
> * Liran Alon (address@hidden) wrote:
>> 
>> 
>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <address@hidden> wrote:
>>> 
>>> * Liran Alon (address@hidden) wrote:
>>>> Hi,
>>>> 
>>>> Looking at QEMU source code, I am puzzled regarding how migration 
>>>> backwards compatibility is preserved regarding X86CPU.
>>>> 
>>>> As I understand it, fields that are based on KVM capabilities and guest 
>>>> runtime usage are defined in VMState subsections in order to not send them 
>>>> if not necessary.
>>>> This is done such that in case they are not needed and we migrate to an 
>>>> old QEMU which don’t support loading this state, migration will still 
>>>> succeed
>>>> (As .needed() method will return false and therefore this state won’t be 
>>>> sent as part of migration stream).
>>>> Furthermore, in case .needed() returns true and old QEMU don’t support 
>>>> loading this state, migration fails. As it should because we are aware 
>>>> that guest state
>>>> is not going to be restored properly on destination.
>>>> 
>>>> I’m puzzled about what will happen in the following scenario:
>>>> 1) Source is running new QEMU with new KVM that supports save of some 
>>>> VMState subsection.
>>>> 2) Destination is running new QEMU that supports load this state but with 
>>>> old kernel that doesn’t know how to load this state.
>>>> 
>>>> I would have expected in this case that if source .needed() returns true, 
>>>> then migration will fail because of lack of support in destination kernel.
>>>> However, it seems from current QEMU code that this will actually succeed 
>>>> in many cases.
>>>> 
>>>> For example, if msr_smi_count is sent as part of migration stream (See 
>>>> vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>>>> then destination will succeed loading migration stream but kvm_put_msrs() 
>>>> will actually ignore env->msr_smi_count and will successfully load guest 
>>>> state.
>>>> Therefore, migration will succeed even though it should have failed…
>>>> 
>>>> It seems to me that QEMU should have for every such VMState subsection, a 
>>>> .post_load() method that verifies that relevant capability is supported by 
>>>> kernel
>>>> and otherwise fail migration.
>>>> 
>>>> What do you think? Should I really create a patch to modify all these 
>>>> CPUX86 VMState subsections to behave like this?
>>> 
>>> I don't know the x86 specific side that much; but from my migration side
>>> the answer should mostly be through machine types - indeed for smi-count
>>> there's a property 'x-migrate-smi-count' which is off for machine types
>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
>>> kernel you should stick to the old machine types.
>>> 
>>> There's nothing guarding running the new machine type on old-kernels;
>>> and arguably we should have a check at startup that complains if
>>> your kernel is missing something the machine type uses.
>>> However, that would mean that people running with -M pc   would fail
>>> on old kernels.
>>> 
>>> A post-load is also a valid check; but one question is whether,
>>> for a particular register, the pain is worth it - it depends on the
>>> symptom that the missing state causes.  If it's minor then you might
>>> conclude it's not worth a failed migration;  if it's a hung or
>>> corrupt guest then yes it is.   Certainly a warning printed is worth
>>> it.
>>> 
>>> Dave
>> 
>> I think we should have flags that allow user to specify which VMState 
>> subsections user explicitly allow to avoid restore even though they are 
>> required to fully restore guest state.
>> But it seems to me that the behaviour should be to always fail migration in 
>> case we load a VMState subsections that we are unable to restore unless user 
>> explicitly specified this is ok
>> for this specific subsection.
>> Therefore, it seems that for every VMState subsection that it’s restore is 
>> based on kernel capability we should:
>> 1) Have a user-controllable flag (which is also tied to machine-type?) to 
>> explicitly allow avoid restoring this state if cannot. Default should be 
>> “false”.
>> 2) Have a .post_load() method that verifies we have required kernel 
>> capability to restore this state, unless flag (1) was specified as “true”.
> 
> This seems a lot of flags; users aren't going to know what to do with
> all of them; I don't see what will set/control them.

True but I think users will want to specify only for a handful of VMState 
subsections that it is OK to not restore them even thought hey are deemed 
needed by source QEMU.
We can create flags only for those VMState subsections.
User should set these flags explicitly on QEMU command-line. As a “-cpu” 
property? I don’t think these flags should be tied to machine-type.

> 
>> Note that above mentioned flags is different than flags such as 
>> “x-migrate-smi-count”.
>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState 
>> subsection to begin with in case we know we migrate to older QEMU which 
>> don’t even have the relevant VMState subsection. But it is not relevant for 
>> the case both source and destination runs QEMU which understands the VMState 
>> subsection but run on kernels with different capabilities.
>> 
>> Also note regarding your first paragraph, that specifying flags based on 
>> kernel you are running on doesn’t help for the case discussed here.
>> As source QEMU is running on new kernel. Unless you meant that source QEMU 
>> should use relevant machine-type based on the destination kernel.
>> i.e. You should launch QEMU with old machine-type as long as you have hosts 
>> in your migration pool that runs with old kernel.
> 
> That's what I meant; stick to the old machine-type unless you know it's
> safe to use a newer one.
> 
>> I don’ think it’s the right approach though. As there is no way to change 
>> flags such as “x-migrate-smi-count” dynamically after all hosts in migration 
>> pool have been upgraded.
>> 
>> What do you think?
> 
> I don't have an easy answer.  The users already have to make sure they
> use a machine type that's old enough for all the QEMUs installed in
> their cluster; making sure it's also old enough for their oldest
> kernel isn't too big a difference - *except* that it's much harder to
> tell which kernel corresponds to which feature/machine type etc - so
> how does a user know what the newest supported machine type is?
> Failing at startup when selecting a machine type that the current
> kernel can't support would help that.
> 
> Dave

First, machine-type express the set of vHW behaviour and properties that is 
exposed to guest.
Therefore, machine-type shouldn’t change for a given guest lifetime (including 
Live-Migrations).
Otherwise, guest will experience different vHW behaviour and properties 
before/after Live-Migration.
So I think machine-type is not relevant for this discussion. We should focus on 
flags which specify
migration behaviour (such as “x-migrate-smi-count” which can also be controlled 
by machine-type but not only).

Second, this strategy results in inefficient migration management. Consider the 
following scenario:
1) Guest running on new_qemu+old_kernel migrate to host with 
new_qemu+new_kernel.
Because source is old_kernel than destination QEMU is launched with 
(x-migrate-smi-count == false).
2) Assume at this point fleet of hosts have half of hosts with old_kernel and 
half with new_kernel.
3) Further assume that guest workload indeed use msr_smi_count and therefore 
relevant VMState subsection should be sent to properly preserve guest state.
4) From some reason, we decide to migrate again the guest in (1).
Even if guest is migrated to a host with new_kernel, then QEMU still avoids 
sending msr_smi_count VMState subsection because it is launched with 
(x-migrate-smi-count == false).

Therefore, I think it makes more sense that source QEMU will always send all 
VMState subsection that are deemed needed (i.e. .nedeed() returns true)
and let receive-side decide if migration should fail if this subsection was 
sent but failed to be restored.
The only case which I think sender should limit the VMState subsection it sends 
to destination is because source is running older QEMU
which is not even aware of this VMState subsection (Which is to my 
understanding the rational behind using “x-migrate-smi-count” and tie it up to 
machine-type).

Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I 
modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” 
set to true?
As I would like future migrations to do send this VMState subsection. Currently 
there is no QMP command to update these flags.

Fourth, I think it’s not trivial for management-plane to be aware with which 
flags it should set on destination QEMU based on currently running kernels on 
fleet.
It’s not the same as machine-type, as already discussed above doesn’t change 
during the entire lifetime of guest.

I’m also not sure it is a good idea that we currently control flags such as 
“x-migrate-smi-count” from machine-type.
As it means that if a guest was initially launched using some old QEMU, it will 
*forever* not migrate some VMState subsection during all it’s Live-Migrations.
Even if all hosts and all QEMUs on fleet are capable of migrating this state 
properly.
Maybe it is preferred that this flag was specified as part of “migrate” command 
itself in case management-plane knows it wishes to migrate even though dest QEMU
is older and doesn’t understand this specific VMState subsection.

I’m left pretty confused about QEMU’s migration compatibility strategy...

-Liran

> 
>> -Liran
>> 
>>> 
>>>> Thanks,
>>>> -Liran
>>> --
>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]