[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is se
From: |
Wang, Wei W |
Subject: |
RE: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false |
Date: |
Thu, 11 Jul 2024 11:40:51 +0000 |
On Friday, July 5, 2024 9:34 PM, Peter Xu wrote:
> On Fri, Jul 05, 2024 at 10:22:23AM +0000, Wang, Wei W wrote:
> > On Thursday, July 4, 2024 11:59 PM, Peter Xu wrote:
> > > On Thu, Jul 04, 2024 at 03:10:27PM +0000, Wang, Wei W wrote:
> > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index
> > > > > > 4c2e6f3a71..7db4fe4ead 100644
> > > > > > --- a/target/i386/cpu.c
> > > > > > +++ b/target/i386/cpu.c
> > > > > > @@ -8258,7 +8258,7 @@ static Property x86_cpu_properties[] = {
> > > > > > DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU,
> > > > > > hyperv_ver_id_sn, 0),
> > > > > >
> > > > > > DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
> > > > > > - DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> > > > > > + DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, true),
> > > > >
> > > > > I assume in many cases people can still properly migrate when
> > > > > the hosts are similar or identical, so maybe we at least want
> > > > > the old machine types keep working (by introducing a machine compat
> property)?
> > > >
> > > > You meant keeping "enforce_cpuid=false" for old machine types
> > > > (e.g. before
> > > 9.1)?
> > > > This will make them non-migratable with this patch, but they were
> > > > migratable (by
> > > > default) as "migratable" wasn't enforced by "enforce_cpuid".
> > > > Should we keep them being migratable by default (e.g.
> enforce_cpuid=true) as well?
> > >
> > > Ah, this is trickier than I thought..
> > >
> > > The issue is if we make them silently switch to enforce_cpuid=true
> > > on old machines, there's chance they start to fail boot, am I right?
> >
> > Right for newly launched guests, regardless of whether they are new or
> > old machine types, they will fail to boot when the host cannot afford
> > the features for the configured vCPU models. This is expected, and
> > actually part of the intentions of this patch.
> >
> > When there is a need to boot a guest with reduced features, users need
> > to explicitly add "enforce_cpuid=false", which marks the new booted
> > guest as non-migratable, or a _better_ way, to identify the
> > unsupported features from the host first, and then get it booted with
> > "-cpu CpuModel,-A,-B", this can make it migratable with those known
> > reduced features, and the destination guest is required to use the
> > same QEMU commands (as usual) to reduce the same set of features as the
> source and get a enforced check by "enforce_cpuid".
> >
> > For live update of QEMU for existing running guests (as you mentioned
> > below), the impact is only on the running guests that have had
> > features reduced from vCPU models (at the time of their original
> > launch). For this case, the recommended way to update them to the new
> > QEMU is also to explicitly identify the reduced features and update them
> with "-cpu CpuModel,-A,-B".
> >
> > The rationale behind this is that the features reduced from the guest
> > needs to be explicitly determined and controllable. In terms of live
> > migration, the destination is ensured to have the same set of reduced
> > features as the source side.
> >
> > >
> > > if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
> > > error_setg(&local_err,
> > > accel_uses_host_cpuid() ?
> > > "Host doesn't support requested features" :
> > > "TCG doesn't support requested features");
> > > goto out;
> > > }
> > >
> > > I suppose we still need to keep all the old worlds running all fine
> > > without breaking them when people do an QEMU upgrade. It needs to
> > > work both on booting fine, and on allowing to migrate.
> > >
> > > So maybe we actually need two things?
> > >
> > > - One patch introduce forbit_migration_if_cpuid_mismatch property,
> when
> > > set, block migration if not enforced, otherwise it should still allow
> > > migration even if enforce_cpud=fales. It should default to on, but
> > > off
> > > on old machines.
> > >
> > > - One patch change default value of enforce_cpuid to on, but turn it off
> > > on old machines.
> > >
> > > Does that look right?
> >
> > I think this can work. Not sure what you would think about the above
> explanations.
> > If agree, then probably we don’t need to add the extra complexity.
> >
> > Also, the above two things seem to impede the upgrade for guests with
> > older machine types to incorporate this enforcement. I think the
> > primary goal of live updating to a newer QEMU version is to benefit from the
> enhancements offered by the new QEMU.
> > So it seems more beneficial to bring old guests under such
> > enforcements, given that this doesn't break functionalities that the
> > guest is running. The only requirement for this is to upgrade using
> > more explicit QEMU commands (i.e., -cpu CpuModel,-A,-B) when needed.
>
> What you said makes sense. It's just that the concern still exists, and I'm
> not
> sure whether that'll be too much to ask for a customer.
>
> Also, see this commit:
>
> commit 15e41345906d29a319cc9cdf566347bf79134d24
> Author: Eduardo Habkost <ehabkost@redhat.com>
> Date: Wed Aug 26 13:25:44 2015 -0300
>
> target-i386: Enable "check" mode by default
>
> Current default behavior of QEMU is to silently disable features that
> are not supported by the host when a CPU model is requested in the
> command-line. This means that in addition to risking breaking guest ABI
> by default, we are silent about it.
>
> I would like to enable "enforce" by default, but this can easily break
> existing production systems because of the way libvirt makes assumptions
> about CPU models today (this will change in the future, once QEMU
> provide a proper interface for checking if a CPU model is runnable).
>
> But there's no reason we should be silent about it. So, change
> target-i386 to enable "check" mode by default so at least we have some
> warning printed to stderr (and hopefully logged somewhere) when QEMU
> disables a feature that is not supported by the host system.
>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>
> I don't think I know what's the "libvirt assumptions" mentioned, and how it
> changed as of today. I had a vague memory Libvirt constantly set off on some
> of the relevant flags last time Jiri explained some cpuid issues to me; maybe
> it's
> "check" not "enforce"? It would be great if Jiri or Dan can comment here.
I had a check in libvirt and didn't find "enforce" is set to false anywhere
(except
only in some tests, which I think should be fine).
Seems we haven't got responses from more folks here :)
Is it appropriate to open a discussion thread in devel@lists.libvirt.org to
confirm?
Re: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false, Daniel P . Berrangé, 2024/07/11
- RE: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false, Wang, Wei W, 2024/07/11
- RE: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false, Wang, Wei W, 2024/07/11
- Re: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false, Daniel P . Berrangé, 2024/07/11
- RE: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false, Wang, Wei W, 2024/07/11
- Re: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false, Daniel P . Berrangé, 2024/07/11