qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID


From: Sean Christopherson
Subject: Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits
Date: Fri, 20 Dec 2024 08:59:47 -0800

On Fri, Dec 20, 2024, Xiaoyao Li wrote:
> On 12/19/2024 10:33 AM, Sean Christopherson wrote:
> > > > For all other CPUID bits, what the TDX Module thinks and/or presents to 
> > > > the guest
> > > > is completely irrelevant, at least as far as KVM cares, and to some 
> > > > extent as far
> > > > as QEMU cares.  This includes the TDX Module's FEATURE_PARAVIRT_CTRL, 
> > > > which frankly
> > > > is asinine and should be ignored.  IMO, the TDX Module spec is entirely 
> > > > off the
> > > > mark in its assessment of paravirtualization.  Injecting a #VE instead 
> > > > of a #GP
> > > > isn't "paravirtualization".
> > > > Take TSC_DEADLINE as an example.  "Disabling" the feature from the 
> > > > guest's side
> > > > simply means that WRMSR #GPs instead of #VEs.*Nothing* has changed from 
> > > > KVM's
> > > > perspective.  If the guest makes a TDVMCALL to write IA32_TSC_DEADLINE, 
> > > > KVM has
> > > > no idea if the guest has opted in/out of #VE vs #GP.  And IMO, a sane 
> > > > guest will
> > > > never take a #VE or #GP if it wants to use TSC_DEADLINE; the kernel 
> > > > should instead
> > > > make a direct TDVMCALL and save itself a pointless exception.
> > > > 
> > > >    Enabling Guest TDs are not allowed to access the IA32_TSC_DEADLINE 
> > > > MSR directly.
> > > >    Virtualization of IA32_TSC_DEADLINE depends on the virtual value of
> > > >    CPUID(1).ECX[24] bit (TSC Deadline). The host VMM may configure (as 
> > > > an input to
> > > >    TDH.MNG.INIT) virtual CPUID(1).ECX[24] to be a constant 0 or allow 
> > > > it to be 1
> > > >    if the CPU’s native value is 1.
> > > > 
> > > >    If the TDX module supports #VE reduction, as enumerated by 
> > > > TDX_FEATURES0.VE_REDUCTION
> > > >    (bit 30), and the guest TD has set TD_CTLS.REDUCE_VE to 1, it may 
> > > > control the
> > > >    value of virtual CPUID(1).ECX[24] by writing 
> > > > TDCS.FEATURE_PARAVIRT_CTRL.TSC_DEADLINE.
> > > > 
> > > >    • If the virtual value of CPUID(1).ECX[24] is 0, IA32_TSC_DEADLINE 
> > > > is virtualized
> > > >      as non-existent. WRMSR or RDMSR attempts result in a #GP(0).
> > > > 
> > > >    • If the virtual value of CPUID(1).ECX[24] is 1, WRMSR or RDMSR 
> > > > attempts result
> > > >      in a #VE(CONFIG_PARAVIRT). This enables the TD’s #VE handler.
> > > > 
> > > > Ditto for TME, MKTME.
> > > > 
> > > > FEATURE_PARAVIRT_CTRL.MCA is even weirder, but I still don't see any 
> > > > reason for
> > > > KVM or QEMU to care if it's fixed or configurable.  There's some crazy 
> > > > logic for
> > > > whether or not CR4.MCE can be cleared, but the host can't see guest 
> > > > CR4, and so
> > > > once again, the TDX Module's view of MCA is irrelevant when it comes to 
> > > > handling
> > > > TDVMCALL for the machine check MSRs.
> > > > 
> > > > So I think this again purely comes to back to KVM correctness and 
> > > > safety.  More
> > > > specifically, the TDX Module needs to report features that are 
> > > > unconditionally
> > > > enabled or disabled and can't be emulated by KVM.  For everything else, 
> > > > I don't
> > > > see any reason to care what the TDX module does.
> > > > 
> > > > I'm pretty sure that gives us a way forward.  If there only a handful 
> > > > of features
> > > > that are unconditionally exposed to the guest, then KVM forces those 
> > > > features in
> > > > cpu_caps[*].
> > > I see. Hmm. We could use this new interface to double check the fixed 
> > > bits. It
> > > seems like a relatively cheap sanity check.
> > > 
> > > There already is an interface to get CPUID bits (fixed and dynamic). But 
> > > it only
> > > works after a TD is configured. So if we want to do extra verification or
> > > adjustments, we could use it before entering the TD. Basically, if we 
> > > delay this
> > > logic we don't need to wait for the fixed bit interface.
> > Oh, yeah, that'd work.  Grab the guest CPUID and then verify that bits KVM 
> > needs
> > to be 0 (or 1) are set according, and WARN+kill if there's a mismatch.
> > 
> > Honestly, I'd probably prefer that over using the fixed bit interface, as 
> > my gut
> > says it's less likely for the TDX Module to misreport what CPUID it has 
> > created
> > for the guest, than it is for the TDX module to generate a "fixed bits" 
> > list that
> > doesn't match the code.  E.g. KVM has (had?) more than a few CPUID features 
> > that
> > KVM emulates without actually reporting support to userspace.
> 
> The original motivation of the proposed fxied0 and fixed1 data is to
> complement the CPUID configurability report, which is important for
> userspace. E.g., Currently, what QEMU is doing is hardcoding the fixed0 and
> fixed1 information of a specific TDX release to adjust the
> KVM_GET_SUPPORTED_CPUID result and gets a final "supported" CPUID set for
> TDX. Hardcoding is not a good idea and it's better that KVM can get the data
> from TDX module, then pass to userspace (of course KVM can tweak the data
> based on its own requirement). So, do you think it's useful to have TDX
> module report the fixed0/fixed1 data for this purpose?

I'm definitely supportive of KVM passing on accurate information, so long as 
KVM's
ABI isn't too crazy.

That said, I'm starting to agree with Rick's assessment that trying to enumerate
fixed CPUID feature bits is becoming a fool's errand as the TDX architecture 
gets
more and more complex.

But _that_ said, if userspace ever needs to pivot on the TDX Module *version*,
then IMO that's a non-starter.  E.g. QEMU shouldn't have to hardcode 
fixed0/fixed1
bits based on TDX 1.5.whatever vs. TDX 2.0.whatever.

One alternative idea to trying to enumerate every fixed bit would be to mimic 
what
the VMX architecture does for fixed CR0 bits.  Use TDX Module spec 1.5.06 (or 
whatever
version makes the most sense) as the base, and then adjust fixed/configurable
information based on *features*.  E.g. when unrestricted guest is enabled, 
CR0.PG
and CR0.PE switch from fixed0 to configurable.  At a glance, I think the whole 
#VE
reduction madness could follow a similar path.

And then if the TDX tries to add fixed bits in the future that aren't explicitly
and clearly tied to some feature enablement, we collectively reject that TDX 
spec.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]