[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID
From: |
Sean Christopherson |
Subject: |
Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits |
Date: |
Tue, 17 Dec 2024 16:08:50 -0800 |
On Tue, Dec 17, 2024, Rick P Edgecombe wrote:
> On Mon, 2024-12-16 at 17:53 -0800, Sean Christopherson wrote:
> > Every new feature that lands in hardware needs to either be "benign" or
> > have the
> > appropriate virtualization controls. KVM already has to deal with cases
> > where
> > features can effectively be used without KVM's knowledge. E.g. there are
> > plenty
> > of instruction-level virtualization holes, and SEV-ES doubled down by
> > essentially
> > forcing KVM to let the guest write XCR0 and XSS directly.
>
> We discussed this in the PUCK call.
Argh, I had a response to Xiaoyao all typed up and didn't hit "send" earlier
today.
> It turns out there were two different ideas on how this fixed bit API would be
> used. One is to help userspace understand which configurations are possible.
> For
> this one, I'm not sure how helpful this proposal will be in the long run. I'll
> respond on the other branch of the thread.
>
> The other usage people were thinking of, which I didn't realize before, was to
> prevent the TDX module from setting fixed bits that might require VMMs support
> (i.e. save/restoring something that could affect the host). The rest of the
> mail
> is about this issue.
>
> Due to the steps involved in resolving this confusion, and that we didn't
> really
> reach a conclusion, the discussion is hard to summarize. So instead I'll try
> to
> re-kick it off with an idea which has bits and pieces of what people said...
>
> I think we can't have the TDX module setting new fixed bits that require any
> VMM
> enabling. When we finally have settled upstream TDX support, the TDX module
> needs to understand what things KVM relies on so it doesn't break them with
> updates. But new fixed CPUID bits that require VMM enabling to prevent host
> issues seems like the kind of thing in general that just shouldn't happen.
>
> As for new configurable bits that require VMM enabling. Adrian was suggesting
> that the TDX module currently only has two guest CPUID bits that are
> problematic
> for KVM today (and the next vcpu enter/exit series has a patch to forbid
> them).
> But a re-check of this assertion is warranted.
>
> It seems like an anti-pattern to have KVM maintaining any code to defend
> against
> TDX module changes that could instead be handled with a promise.
I disagree, sanity checking hardware and firmware is a good thing. E.g. see
KVM's
VMCS checks, the sanity checks for features SEV depends on, etc.
That said, I'm not terribly concerned about more features that are
unconditionally
exposed to the guest, because that will cause problems for other reasons, i.e.
Intel should already be heavily incentivized to not do silly things.
> However, KVM having code to defend against userspace prodding the TDX module
> to do something bad to the host seems valid. So fixed bit issues should be
> handled with a promise, but issues related to new configurable bits seems
> open.
>
> Some options discussed on the call:
>
> 1. If we got a promise to require any new CPUID bits that clobber host state
> to
> require an opt-in (attributes bit, etc) then we could get by with a promise
> for
> that too. The current situation was basically to assume TDX module wouldn't
> open
> up the issue with new CPUID bits (only attributes/xfam).
> 2. If we required any new configurable CPUID bits to save/restore host state
> automatically then we could also get by, but then KVM's code that does host
> save/restore would either be redundant or need a TDX branch.
> 3. If we prevent setting any CPUID bits not supported by KVM, we would need to
> track these bits in KVM. The data backing GET_SUPPORTED_CPUID is not
> sufficient
> for this purpose since it is actually more like "default values" then a mask
> of
> supported bits. A patch to try to do this filtering was dropped after upstream
> discussion.[0]
The only CPUID bits that truly matter are those that are associated with
hardware
features the TDX module allows the guest to use directly. And for those, KVM
*must* know if they are fixed0 (inverted polarity only?), fixed1, or
configurable.
As Adrian asserted, there probably aren't many of them.
For all other CPUID bits, what the TDX Module thinks and/or presents to the
guest
is completely irrelevant, at least as far as KVM cares, and to some extent as
far
as QEMU cares. This includes the TDX Module's FEATURE_PARAVIRT_CTRL, which
frankly
is asinine and should be ignored. IMO, the TDX Module spec is entirely off the
mark in its assessment of paravirtualization. Injecting a #VE instead of a #GP
isn't "paravirtualization".
Take TSC_DEADLINE as an example. "Disabling" the feature from the guest's side
simply means that WRMSR #GPs instead of #VEs. *Nothing* has changed from KVM's
perspective. If the guest makes a TDVMCALL to write IA32_TSC_DEADLINE, KVM has
no idea if the guest has opted in/out of #VE vs #GP. And IMO, a sane guest will
never take a #VE or #GP if it wants to use TSC_DEADLINE; the kernel should
instead
make a direct TDVMCALL and save itself a pointless exception.
Enabling Guest TDs are not allowed to access the IA32_TSC_DEADLINE MSR
directly.
Virtualization of IA32_TSC_DEADLINE depends on the virtual value of
CPUID(1).ECX[24] bit (TSC Deadline). The host VMM may configure (as an input
to
TDH.MNG.INIT) virtual CPUID(1).ECX[24] to be a constant 0 or allow it to be 1
if the CPU’s native value is 1.
If the TDX module supports #VE reduction, as enumerated by
TDX_FEATURES0.VE_REDUCTION
(bit 30), and the guest TD has set TD_CTLS.REDUCE_VE to 1, it may control the
value of virtual CPUID(1).ECX[24] by writing
TDCS.FEATURE_PARAVIRT_CTRL.TSC_DEADLINE.
• If the virtual value of CPUID(1).ECX[24] is 0, IA32_TSC_DEADLINE is
virtualized
as non-existent. WRMSR or RDMSR attempts result in a #GP(0).
• If the virtual value of CPUID(1).ECX[24] is 1, WRMSR or RDMSR attempts
result
in a #VE(CONFIG_PARAVIRT). This enables the TD’s #VE handler.
Ditto for TME, MKTME.
FEATURE_PARAVIRT_CTRL.MCA is even weirder, but I still don't see any reason for
KVM or QEMU to care if it's fixed or configurable. There's some crazy logic for
whether or not CR4.MCE can be cleared, but the host can't see guest CR4, and so
once again, the TDX Module's view of MCA is irrelevant when it comes to handling
TDVMCALL for the machine check MSRs.
So I think this again purely comes to back to KVM correctness and safety. More
specifically, the TDX Module needs to report features that are unconditionally
enabled or disabled and can't be emulated by KVM. For everything else, I don't
see any reason to care what the TDX module does.
I'm pretty sure that gives us a way forward. If there only a handful of
features
that are unconditionally exposed to the guest, then KVM forces those features in
cpu_caps[*]. I.e. treat them kinda like XSAVES on AMD, where KVM assumes the
guest can use XSAVES if it's supported in hardware and XSAVE is exposed to the
guest, because AMD didn't provide an interception knob for XSAVES.
If the list is "too" long (sujbective) for KVM to hardcode, then we revisit and
get the TDX module to provide a list.
This probably doesn't solve Xiaoyao's UX problem in QEMU, but I think it gives
us a sane approach for KVM.
[*] https://lore.kernel.org/all/20241128013424.4096668-1-seanjc@google.com
> Other idea
> ----------
> Previously we tried to maintain an allow list of KVM supported configurable
> bits
> [0]. It was do-able, but not ideal. It would be smaller for KVM to protect
> itself with a deny list of bits, or rather a list of bits that needs to be in
> KVM_GET_SUPPORTED_CPUID, or they should not be allowed to be configured. But
> KVM
> can't keep a list of bits that it doesn't know about.
>
> But the TDX module does know which bits that it supports result in host state
> getting clobbered. So we could ask TDX module to expose a list of bits that
> have
> an effect on host state. We could check those against KVM_GET_SUPPORTED_CPUID.
> That check could be expected to fit better than when we tried to massage
> KVM_GET_SUPPORTED_CPUID to be a mask that includes all possible configurable
> bits (multi-bit fields, etc).
>
> In the meantime we could keep a list of all of today's host affecting bits.
> TDX
> module would need to gate any new bits that effect host state behind a new
> sys-
> wide opt-in that comes with the "clobber bits" metadata. Before entering a TD,
> KVM would check the clobber bits in KVM's copy of CPUID against the TD's copy
> to
> make sure everyone knows what they have to do.
>
> (and also this opt-in stuff would need to be run by the TDX module team of
> course)
>
> It leaves open the possibility that there is some other bits KVM cares about
> that don't have to do with clobbering host state. Not sure about it.
>
> [0]
> https://lore.kernel.org/kvm/20240812224820.34826-26-rick.p.edgecombe@intel.com/
- (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Xiaoyao Li, 2024/12/05
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Edgecombe, Rick P, 2024/12/06
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Xiaoyao Li, 2024/12/09
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Edgecombe, Rick P, 2024/12/10
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Sean Christopherson, 2024/12/16
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Xiaoyao Li, 2024/12/16
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Edgecombe, Rick P, 2024/12/17
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits,
Sean Christopherson <=
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Edgecombe, Rick P, 2024/12/18
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Sean Christopherson, 2024/12/18
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Edgecombe, Rick P, 2024/12/19
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Xiaoyao Li, 2024/12/19
- Re: (Proposal) New TDX Global Metadata To Report FIXED0 and FIXED1 CPUID Bits, Sean Christopherson, 2024/12/20