qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is se


From: Peter Xu
Subject: Re: [PATCH v1] target/i386: kvm: Block migration when enfore_cpuid is set to false
Date: Fri, 5 Jul 2024 09:34:09 -0400

On Fri, Jul 05, 2024 at 10:22:23AM +0000, Wang, Wei W wrote:
> On Thursday, July 4, 2024 11:59 PM, Peter Xu wrote:
> > On Thu, Jul 04, 2024 at 03:10:27PM +0000, Wang, Wei W wrote:
> > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index
> > > > > 4c2e6f3a71..7db4fe4ead 100644
> > > > > --- a/target/i386/cpu.c
> > > > > +++ b/target/i386/cpu.c
> > > > > @@ -8258,7 +8258,7 @@ static Property x86_cpu_properties[] = {
> > > > >      DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU,
> > > > > hyperv_ver_id_sn, 0),
> > > > >
> > > > >      DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
> > > > > -    DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> > > > > +    DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, true),
> > > >
> > > > I assume in many cases people can still properly migrate when the
> > > > hosts are similar or identical, so maybe we at least want the old
> > > > machine types keep working (by introducing a machine compat property)?
> > >
> > > You meant keeping "enforce_cpuid=false" for old machine types (e.g. before
> > 9.1)?
> > > This will make them non-migratable with this patch, but they were
> > > migratable (by
> > > default) as "migratable" wasn't enforced by "enforce_cpuid". Should we
> > > keep them being migratable by default (e.g. enforce_cpuid=true) as well?
> > 
> > Ah, this is trickier than I thought..
> > 
> > The issue is if we make them silently switch to enforce_cpuid=true on old
> > machines, there's chance they start to fail boot, am I right?
> 
> Right for newly launched guests, regardless of whether they are new or old
> machine types, they will fail to boot when the host cannot afford the features
> for the configured vCPU models. This is expected, and actually part of the
> intentions of this patch.
> 
> When there is a need to boot a guest with reduced features, users need to
> explicitly add "enforce_cpuid=false", which marks the new booted guest as
> non-migratable, or a _better_ way, to identify the unsupported features from 
> the host first, and then get it booted with "-cpu CpuModel,-A,-B", this can 
> make
> it migratable with those known reduced features, and the destination guest is
> required to use the same QEMU commands (as usual) to reduce the same set
> of features as the source and get a enforced check by "enforce_cpuid".
> 
> For live update of QEMU for existing running guests (as you mentioned
> below), the impact is only on the running guests that have had features 
> reduced
> from vCPU models (at the time of their original launch). For this case, the
> recommended way to update them to the new QEMU is also to explicitly identify
> the reduced features and update them with "-cpu CpuModel,-A,-B".
> 
> The rationale behind this is that the features reduced from the guest needs to
> be explicitly determined and controllable. In terms of live migration, the
> destination is ensured to have the same set of reduced features as the source
> side.
> 
> > 
> >     if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
> >         error_setg(&local_err,
> >                    accel_uses_host_cpuid() ?
> >                        "Host doesn't support requested features" :
> >                        "TCG doesn't support requested features");
> >         goto out;
> >     }
> > 
> > I suppose we still need to keep all the old worlds running all fine without
> > breaking them when people do an QEMU upgrade.  It needs to work both on
> > booting fine, and on allowing to migrate.
> > 
> > So maybe we actually need two things?
> > 
> >   - One patch introduce forbit_migration_if_cpuid_mismatch property, when
> >     set, block migration if not enforced, otherwise it should still allow
> >     migration even if enforce_cpud=fales.  It should default to on, but off
> >     on old machines.
> > 
> >   - One patch change default value of enforce_cpuid to on, but turn it off
> >     on old machines.
> > 
> > Does that look right?
> 
> I think this can work. Not sure what you would think about the above 
> explanations.
> If agree, then probably we don’t need to add the extra complexity.
> 
> Also, the above two things seem to impede the upgrade for guests with older 
> machine
> types to incorporate this enforcement. I think the primary goal of live 
> updating to a
> newer QEMU version is to benefit from the enhancements offered by the new 
> QEMU.
> So it seems more beneficial to bring old guests under such enforcements, given
> that this doesn't break functionalities that the guest is running. The only
> requirement for this is to upgrade using more explicit QEMU commands
> (i.e., -cpu CpuModel,-A,-B) when needed.

What you said makes sense.  It's just that the concern still exists, and
I'm not sure whether that'll be too much to ask for a customer.

Also, see this commit:

commit 15e41345906d29a319cc9cdf566347bf79134d24
Author: Eduardo Habkost <ehabkost@redhat.com>
Date:   Wed Aug 26 13:25:44 2015 -0300

    target-i386: Enable "check" mode by default
    
    Current default behavior of QEMU is to silently disable features that
    are not supported by the host when a CPU model is requested in the
    command-line. This means that in addition to risking breaking guest ABI
    by default, we are silent about it.
    
    I would like to enable "enforce" by default, but this can easily break
    existing production systems because of the way libvirt makes assumptions
    about CPU models today (this will change in the future, once QEMU
    provide a proper interface for checking if a CPU model is runnable).
    
    But there's no reason we should be silent about it. So, change
    target-i386 to enable "check" mode by default so at least we have some
    warning printed to stderr (and hopefully logged somewhere) when QEMU
    disables a feature that is not supported by the host system.

    Reviewed-by: Igor Mammedov <imammedo@redhat.com>
    Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

I don't think I know what's the "libvirt assumptions" mentioned, and how it
changed as of today.  I had a vague memory Libvirt constantly set off on
some of the relevant flags last time Jiri explained some cpuid issues to
me; maybe it's "check" not "enforce"? It would be great if Jiri or Dan can
comment here.

Copy Igor too from the commit.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]