qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 3/3] i386/sev: Add KVM_EXIT_SNP_REQ_CERTS support for cert


From: Michael Roth
Subject: Re: [PATCH v1 3/3] i386/sev: Add KVM_EXIT_SNP_REQ_CERTS support for certificate-fetching
Date: Thu, 19 Dec 2024 07:16:01 -0600

On Thu, Dec 19, 2024 at 08:13:44AM +0000, Daniel P. Berrangé wrote:
> On Wed, Dec 18, 2024 at 04:29:51PM -0600, Michael Roth wrote:
> > On Wed, Dec 18, 2024 at 05:50:52PM +0000, Daniel P. Berrangé wrote:
> > > On Wed, Dec 18, 2024 at 09:49:39AM -0600, Michael Roth wrote:
> > > > The GHCB specification[1] defines a VMGEXIT-based Guest Request
> > > > hypercall to allow an SNP guest to issue encrypted requests directly to
> > > > SNP firmware to do things like query the attestation report for the
> > > > guest. These are generally handled purely in the kernel.
> > > > 
> > > > In some some cases, it's useful for the host to be able to additionally
> > > > supply the certificate chain for the signing key that SNP firmware uses
> > > > to sign these attestation reports. To allow for this, the GHCB
> > > > specification defines an Extended Guest Request where this certificate
> > > > data can be provided in a special format described in the GHCB spec.
> > > > This certificate data may be global or guest-specific depending on how
> > > > the guest was configured. Rather than providing interfaces to manage
> > > > these within the kernel, KVM provides a new KVM_EXIT_SNP_REQ_CERTS exit
> > > > to request the certificate contents from userspace. Implement support
> > > > for that here.
> > > > 
> > > > To synchronize delivery of the certificates to the guest in a way where
> > > > they will not be rendered invalid by updates to SNP firmware or
> > > > attestation singing/endorsement keys by management tools outside the
> > > > purview of QEMU, it is expected by users of KVM_EXIT_SNP_REQ_CERTS to
> > > > obtain a shared/read lock on the certificate file prior to delivering
> > > > them back to KVM. Only after this will the attestation report be
> > > > retrieved from firmware and bundled with the certificate data, so QEMU
> > > > must continue to hold the file lock until KVM confirms that the
> > > > attestation report has been retrieved/bundled. This confirmation is done
> > > > by way of the kvm_immediate_exit callback infrastructure that was
> > > > introduced in a previous patch.
> > > > 
> > > > [1] "Guest Hypervisor Communication Block (GHCB) Standardization",
> > > >     https://www.amd.com/en/developer/sev.html
> > > > 
> > > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > > ---
> > > >  qapi/qom.json                 |  23 +++-
> > > >  target/i386/kvm/kvm.c         |  10 ++
> > > >  target/i386/sev-sysemu-stub.c |   5 +
> > > >  target/i386/sev.c             | 249 ++++++++++++++++++++++++++++++++++
> > > >  target/i386/sev.h             |   2 +
> > > >  5 files changed, 288 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/qapi/qom.json b/qapi/qom.json
> > > > index 28ce24cd8d..6eaf0e7721 100644
> > > > --- a/qapi/qom.json
> > > > +++ b/qapi/qom.json
> > > > @@ -1034,6 +1034,25 @@
> > > >  #     firmware.  Set this to true to disable the use of VCEK.
> > > >  #     (default: false) (since: 9.1)
> > > >  #
> > > > +# @certs-path: Path to certificate data that can be passed to guests 
> > > > via
> > > > +#              SNP Extended Guest Requests. File should be in the 
> > > > format
> > > > +#              described in the GHCB specification. (default: none)
> > > > +#              (since: 10.0)
> > > 
> > > Can we document the required format here explicitly, rather than expecting
> > > users to go searching for specs which are often practically impossible
> > > to find, and even harder to read & interpret ?
> > 
> > It'll be difficult to summarize in a way that will be self-reliant,
> > since knowing the certificate format is not sufficient to make sure
> > it coincides with the endorsement key being used by firmware. So I can't
> > promise to completely reduce reliance on external specs, but at least
> > give a better of the format and where those external specs will come
> > into play in filling out the data.
> > 
> > If it needs to be at least somewhat self-sufficient then that might
> > warrant a separate document in docs/system/i386/amd-memory-encryption.rst
> > or somewhere thereabouts that summarizes the whole attestation flow and
> > how certificates tie into that.
> > 
> > Any preferences?
> > 
> > > 
> > > > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > > > index 1a4eb1ada6..2c41bdbccf 100644
> > > > --- a/target/i386/sev.c
> > > > +++ b/target/i386/sev.c
> > > > @@ -157,6 +157,9 @@ struct SevSnpGuestState {
> > > >      char *id_auth_base64;
> > > >      uint8_t *id_auth;
> > > >      char *host_data;
> > > > +    char *certs_path;
> > > > +    int certs_fd;
> > > > +    uint32_t certs_timeout;
> > > >  
> > > >      struct kvm_sev_snp_launch_start kvm_start_conf;
> > > >      struct kvm_sev_snp_launch_finish kvm_finish_conf;
> > > > @@ -1355,6 +1358,215 @@ sev_snp_launch_finish(SevCommonState 
> > > > *sev_common)
> > > >      }
> > > >  }
> > > >  
> > > > +static int open_certs_locked(SevSnpGuestState *sev_snp_guest)
> > > > +{
> > > > +    int fd, ret;
> > > > +
> > > > +    if (sev_snp_guest->certs_fd != -1) {
> > > > +        return 0;
> > > > +    }
> > > > +
> > > > +    fd = qemu_open(sev_snp_guest->certs_path, O_RDONLY, NULL);
> > > > +    if (fd == -1) {
> > > > +        error_report("Unable to open certificate blob at path %s, ret 
> > > > %d",
> > > > +                     sev_snp_guest->certs_path, fd);
> > > > +        return fd;
> > > > +    }
> > > > +
> > > > +    ret = qemu_lock_fd(fd, 0, 0, false);
> > > > +    if (ret == -EAGAIN || ret == -EACCES) {
> > > > +        ret = -EAGAIN;
> > > > +        goto out_close;
> > > > +    } else if (ret) {
> > > > +        goto out_close;
> > > > +    }
> > > 
> > > This locking scheme is likely unsafe. Consider this sequence
> > > 
> > >   * QEMU runs qemu_open(path)
> > >   * External mgmt app runs unlink(path)
> > >   * External mgmt app runs open(path)
> > >   * External mgmt app runs lock(fd)
> > >   * QEMU runs qemu_lock_fd(fd)
> > > 
> > > QEMU has successfully acquired a lock on an FD that corresponds to a
> > > deleted file, not the current existing file.
> > > 
> > > Avoiding this problem requires either that the external mgmt app agrees
> > > to *NEVER* unlink() the files under any circumstance, or for QEMU to
> > > run its open + lock logic in a loop, checking 'stat' and 'fstat' before
> > > opening and after locking, in order to detect a replaced file from its
> > > changed inode.
> > > 
> > > I'm not inclined to rely on mgmt apps never unlink()ing as that's to
> > > easy to mess up IMHO.
> > 
> > Yah I went into more detail in my response to Markus, but long story
> > short is that we are assuming mgmt is cooperative in this case, and
> > so as you mentioned, it would never unlink files while SNP guests are
> > running, but instead take an exclusive lock on them and update them in
> > place with the understanding that doing anything otherwise would open
> > a race window where guests might get stale certificates.
> 
> If there's an expectation & requirement that no SNP guests are running,
> then IMHO this whole thing is just over-engineered. Just remove all this
> locking code entirely, and document that none of this must be changed
> while QEMU is running - which is a common requirement for a great many
> things on the host.

VCEK endorsement keys can change as a result of SNP firmware updates,
which can be done while SNP guests are running and are often done in such
a way to patch bugs/security holes. VLEK endorsement keys can similarly be
updated on a live host. Both these sorts of interactions cannot be made
compatible with bundling certificates with attestation reports without some
orchestration in place to keep them atomic relative to the endorsement
key being used by firmware to sign attestation reports. Every CSP
implementing this will need to solve it in some way, and I'm sure some
will handle all this completely differently. But it will make
interoperative management/tooling a mess, and having a reference
implementation based around something common will make it easier to
steer CSPs to that common solution and give management tools authors
*some* reference approach to target rather than expecting to retrofit
some custom solution on top.

With these patches, you can update firmware and endorsement keys while
SNP guests are running, but it requires write locks on any active
certificates as defined here and in the kernel, and doing certificate
file updates in place while that write lock is still held. I don't really
think that's over-engineered. I think it's surprisingly simple given the
potential complexity of the above-mentioned requirements.

But yes if management tries to unlink certificates while SNP guests are
running, all bets are off. But at that point they are not cooperating
with kernel/QEMU, and we don't need to care about that. And if they
really do need to blow away certificates for a complete re-install
or data-wipe or whatever, at that point they'd just need to ensure
they stop all their SNP guests first.

-Mike

> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]