[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface su
From: |
Ani Sinha |
Subject: |
Re: [PATCH] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface support |
Date: |
Tue, 10 Dec 2024 08:58:20 +0530 |
On Mon, Dec 2, 2024 at 1:17 PM Ani Sinha <anisinha@redhat.com> wrote:
>
>
>
> > On 29 Nov 2024, at 3:42 PM, Philippe Mathieu-Daudé <philmd@linaro.org>
> > wrote:
> >
> > On 29/11/24 10:16, Ani Sinha wrote:
> >> VM firmware update is a mechanism where the virtual machines can use their
> >> preferred and trusted firmware image in their execution environment without
> >> having to depend on a untrusted party to provide the firmware bundle. This
> >> is
> >> particularly useful for confidential virtual machines that are deployed in
> >> the
> >> cloud where the tenant and the cloud provider are two different entities.
> >> In
> >> this scenario, virtual machines can bring their own trusted firmware image
> >> bundled as a part of their filesystem (using UKIs for example[1]) and then
> >> use
> >> this hypervisor interface to update to their trusted firmware image. This
> >> also
> >> allows the guests to have a consistent measurements on the firmware image.
> >> This change introduces basic support for the fw-cfg based hypervisor
> >> interface
> >> and the corresponding device. The change also includes the
> >> specification document for this interface. The interface is made generic
> >> enough so that guests are free to use their own ABI to pass required
> >> information between initial and trusted execution contexts (where they are
> >> running their own trusted firmware image) without the hypervisor getting
> >> involved in between. In subsequent patches, we will introduce other minimal
> >> changes on the hypervisor that are required to make the mechanism work.
> >> [1] See systemd pull requests https://github.com/systemd/systemd/pull/35091
> >> and https://github.com/systemd/systemd/pull/35281 for some discussions on
> >> how we can bundle firmware image within an UKI.
> >> CC: Alex Graf <graf@amazon.com>
> >> CC: Paolo Bonzini <pbonzini@redhat.com>
> >> CC: Gerd Hoffman <kraxel@redhat.com>
> >> CC: Igor Mammedov <imammedo@redhat.com>
> >> CC: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> Signed-off-by: Ani Sinha <anisinha@redhat.com>
I know we are in code freeze but I would appreciate any more feedback
on this patch so that when the freeze lifts, we may merge it.
> >> ---
> >> MAINTAINERS | 9 +++
> >> docs/specs/index.rst | 1 +
> >> docs/specs/vmfwupdate.rst | 109 +++++++++++++++++++++++++
> >> hw/misc/meson.build | 2 +
> >> hw/misc/vmfwupdate.c | 152 +++++++++++++++++++++++++++++++++++
> >> include/hw/misc/vmfwupdate.h | 103 ++++++++++++++++++++++++
> >> 6 files changed, 376 insertions(+)
> >> create mode 100644 docs/specs/vmfwupdate.rst
> >> create mode 100644 hw/misc/vmfwupdate.c
> >> create mode 100644 include/hw/misc/vmfwupdate.h
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 095420f8b0..cd4135fb5b 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -2531,6 +2531,15 @@ F: include/hw/acpi/vmgenid.h
> >> F: docs/specs/vmgenid.rst
> >> F: tests/qtest/vmgenid-test.c
> >> +VM Firmware Update
> >> +M: Ani Sinha <anisinha@redhat.com>
> >> +M: Alex Graf <graf@amazon.com>
> >> +M: Paolo Bonzini <pbonzini@redhat.com>
> >> +S: Maintained
> >> +F: hw/misc/vmfwupdate.c
> >> +F: include/hw/misc/vmfwupdate.h
> >> +F: docs/specs/vmfwupdate.rst
> >> +
> >> LED
> >> M: Philippe Mathieu-Daudé <philmd@linaro.org>
> >> S: Maintained
> >> diff --git a/docs/specs/index.rst b/docs/specs/index.rst
> >> index ff5a1f03da..cbda7e0398 100644
> >> --- a/docs/specs/index.rst
> >> +++ b/docs/specs/index.rst
> >> @@ -34,6 +34,7 @@ guest hardware that is specific to QEMU.
> >> virt-ctlr
> >> vmcoreinfo
> >> vmgenid
> >> + vmfwupdate
> >> rapl-msr
> >> rocker
> >> riscv-iommu
> >> diff --git a/docs/specs/vmfwupdate.rst b/docs/specs/vmfwupdate.rst
> >> new file mode 100644
> >> index 0000000000..3a36ca14c7
> >> --- /dev/null
> >> +++ b/docs/specs/vmfwupdate.rst
> >> @@ -0,0 +1,109 @@
> >> +VMFWUPDATE INTERFACE SPECIFICATION
> >> +##################################
> >> +
> >> +Introduction
> >> +************
> >> +
> >> +``Vmfwupdate`` is an extension to ``fw-cfg`` that allows guests to
> >> replace early boot
> >> +code in their virtual machine. Through a combination of vmfwupdate and
> >> +hypervisor stack knowledge, guests can deterministically replace the
> >> launch
> >> +payload for guests. This is useful for environments like SEV-SNP where the
> >> +launch payload becomes the launch digest. Guests can use vmfwupdate to
> >> provide
> >> +a measured, full guest payload (BIOS image, kernel, initramfs, kernel
> >> +command line) to the virtual machine which enables them to easily reason
> >> about
> >> +integrity of the resulting system.
> >> +For more information, please see the `KVM Forum 2024 presentation
> >> <KVMFORUM_>`__
> >> +about this work from the authors [1]_.
> >> +
> >> +
> >> +.. _KVMFORUM: https://www.youtube.com/watch?v=VCMBxU6tAto
> >> +
> >> +Base Requirements
> >> +*****************
> >> +
> >> +#. **fw-cfg**:
> >> + The target system must provide a ``fw-cfg`` interface. For x86 based
> >> + environments, this ``fw-cfg`` interface must be accessible through
> >> PIO ports
> >> + 0x510 and 0x511. The ``fw-cfg`` interface does not need to be
> >> announced as part
> >> + of system device tables such as DSDT. The ``fw-cfg`` interface must
> >> support the
> >> + DMA interface. It may only support the DMA interface for write
> >> operations.
> >> +
> >> +#. **BIOS region**:
> >> + The hypervisor must provide a BIOS region which may be
> >> + statically sized. Through vmfwupdate, the guest is able to
> >> atomically replace
> >> + its contents. The BIOS region must be mapped as read-write memory.
> >> In a
> >> + SEV-SNP environment, the BIOS region must be mapped as private
> >> memory at
> >> + launch time.
> >> +
> >> +Fw-cfg Files
> >> +************
> >> +
> >> +Guests drive vmfwupdate through special ``fw-cfg`` files that control its
> >> flow
> >> +followed by a standard system reset operation. When vmfwupdate is
> >> available,
> >> +it provides the following ``fw-cfg`` files:
> >> +
> >> +* ``vmfwupdate/cap`` (``u64``) - Read-only Little Endian encoded bitmap
> >> of additional
> >> + capabilities the interface supports. List of available capabilities:
> >> +
> >> + ``VMFWUPDATE_CAP_BIOS_RESIZE 0x0000000000000001``
> >> +
> >> +* ``vmfwupdate/bios-size`` (``u32``) - Little Endian encoded size of the
> >> BIOS region.
> >> + Read-only by default. Optionally Read-write if ``vmfwupdate/cap``
> >> contains
> >> + ``VMFWUPDATE_CAP_BIOS_RESIZE``. On write, the BIOS region may resize.
> >> Guests are
> >> + required to read the value after writing and compare it with the
> >> requested size
> >> + to determine whether the resize was successful. Note, x86 BIOS regions
> >> always
> >> + start at 4GiB - bios-size.
> >> +
> >> +* ``vmfwupdate/opaque`` (``1024 bytes``) - A 1KiB buffer that survives
> >> the BIOS replacement
> >> + flow. Can be used by the guest to propagate guest physical addresses of
> >> payloads
> >> + to its BIOS stage. It’s recommended to make the new BIOS clear this
> >> file on boot
> >> + if it exists. Contents of this file are under control by the
> >> hypervisor. In an
> >> + environment that considers the hypervisor outside of its trust
> >> boundary, guests
> >> + are advised to validate its contents before consumption.
> >> +
> >> +* ``vmfwupdate/disable`` (``u8``) - Indicates whether the interface is
> >> disabled.
> >> + Returns 0 for enabled, 1 for disabled. Writing any value disables it.
> >> Writing is
> >> + only allowed if the value is 0. When the interface is disabled, the
> >> replace file
> >> + is ignored on reset. This value resets to 0 on system reset.
> >> +
> >> +* ``vmfwupdate/bios-addr`` (``u64``) - A 64bit Little Endian encoded
> >> guest physical address
> >> + at the beginning of the replacement BIOS region. The provided payload
> >> must reside
> >> + in shared memory. 0 on system reset.
> >> +
> >> +
> >> +Triggering the Firmware Update
> >> +******************************
> >> +
> >> +To initiate the firmware update process, the guest issues a standard
> >> system reset
> >> +operation through any of the means implemented by the machine model.
> >> +
> >> +On reset, the hypervisor evaluates whether ``vmfwupdate/disable`` is
> >> ``1``. If it is, it ignores
> >> +any other vmfwupdate values and performs a standard system reset.
> >> +
> >> +If ``vmfwupdate/disable`` is ``0``, the hypervisor checks if bios-addr is
> >> ``0``. If it is, it
> >> +performs a standard system reset.
> >> +
> >> +If ``vmfwupdate/bios-addr`` is ``non-0``, the hypervisor replaces the
> >> contents of the system’s
> >> +BIOS region with the guest physically contiguous ``vmfwupdate/bios-size``
> >> sized payload at the
> >> +guest physical address address vmfwupdate/bios-addr.
> >> +
> >> +As part of the reset operation, all existing guest shared memory as well
> >> as the
> >> +``vmfwupdate/opaque`` file are preserved. CPU and device state are reset
> >> to the default
> >> +hypervisor specific reset states. In SEV-SNP environments, the reset
> >> causes recreation
> >> +of the VM context which triggers a fresh measurement of the replaced BIOS
> >> region and
> >> +reset CPU state. The guest always resumes operation in the highest
> >> privileged mode
> >> +available to it (VMPL0 in SEV-SNP).
> >> +
> >> +Closing Remarks
> >> +***************
> >> +The handover protocol (format of the ``vmwupdate/opaque`` file etc.) will
> >> be implemented by
> >> +the firmware loader and firmware image, both provided by the guest. The
> >> hypervisor does
> >> +not need to know these details, so it is not included in this
> >> specification.
> >> +
> >> +
> >> +
> >> +Footnotes:
> >> +^^^^^^^^^^
> >> +.. [1] Original author of the specification: *Alex Graf
> >> <graf@amazon.com>*,
> >> + converted to re-structured-text (rst format) and slightly edited
> >> + by *Ani Sinha <anisinha@redhat.com>*.
> >> diff --git a/hw/misc/meson.build b/hw/misc/meson.build
> >> index d02d96e403..4c5bdb0de2 100644
> >> --- a/hw/misc/meson.build
> >> +++ b/hw/misc/meson.build
> >> @@ -148,6 +148,8 @@ specific_ss.add(when: 'CONFIG_MAC_VIA', if_true:
> >> files('mac_via.c'))
> >> specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c',
> >> 'mips_cpc.c'))
> >> specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))
> >> +specific_ss.add(when: 'CONFIG_FW_CFG_DMA', if_true:
> >> files('vmfwupdate.c'))
> >> +
> >> system_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
> >> # HPPA devices
> >> diff --git a/hw/misc/vmfwupdate.c b/hw/misc/vmfwupdate.c
> >> new file mode 100644
> >> index 0000000000..39fac68cbe
> >> --- /dev/null
> >> +++ b/hw/misc/vmfwupdate.c
> >> @@ -0,0 +1,152 @@
> >> +/*
> >> + * Guest driven VM boot component update device
> >> + * For details and specification, please look at
> >> docs/specs/vmfwupdate.rst.
> >> + *
> >> + * Copyright (C) 2024 Red Hat, Inc.
> >> + *
> >> + * Authors: Ani Sinha <anisinha@redhat.com>
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2 or
> >> later.
> >> + * See the COPYING file in the top-level directory.
> >> + *
> >> + */
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qapi/error.h"
> >> +#include "qemu/module.h"
> >> +#include "sysemu/reset.h"
> >> +#include "hw/nvram/fw_cfg.h"
> >> +#include "hw/i386/pc.h"
> >> +#include "hw/qdev-properties.h"
> >> +#include "hw/misc/vmfwupdate.h"
> >> +#include "qemu/error-report.h"
> >> +
> >> +static void fw_update_reset(void *dev)
> >> +{
> >> + /* a NOOP at present */
> >> + return;
> >> +}
> >> +
> >> +
> >> +static uint64_t get_max_fw_size(void)
> >> +{
> >> + Object *m_obj = qdev_get_machine();
> >> + PCMachineState *pcms = PC_MACHINE(m_obj);
> >> +
> >> + if (pcms) {
> >> + return pcms->max_fw_size;
> >> + } else {
> >> + return 0;
> >
> > Isn't it a configuration error?
>
> It isn’t if we do not expose VMFWUPDATE_CAP_BIOS_RESIZE capability to other
> machines. I will fix this in v2.
> Also I am not sure what is the consistent way to get this value for non-pc
> machines.
>
> >
> >> + }
> >> +}
> >> +
> >> +static void fw_blob_write(void *dev, off_t offset, size_t len)
> >> +{
> >> + VMFwUpdateState *s = VMFWUPDATE(dev);
> >> +
> >> + /*
> >> + * in order to change the bios size, appropriate capability
> >> + must be enabled
> >> + */
> >> + if (s->fw_blob.bios_size &&
> >> + !(s->capability & VMFWUPDATE_CAP_BIOS_RESIZE)) {
> >> + warn_report("vmfwupdate: VMFWUPDATE_CAP_BIOS_RESIZE not enabled");
> >> + return;
> >> + }
> >> +
> >> + s->plat_bios_size = s->fw_blob.bios_size;
> >> +
> >> + return;
> >> +}
> >> +
> >> +static void vmfwupdate_realize(DeviceState *dev, Error **errp)
> >> +{
> >> + VMFwUpdateState *s = VMFWUPDATE(dev);
> >> + FWCfgState *fw_cfg = fw_cfg_find();
> >> +
> >> + /* multiple devices are not supported */
> >> + if (!vmfwupdate_find()) {
> >> + error_setg(errp, "at most one %s device is permitted",
> >> + TYPE_VMFWUPDATE);
> >> + return;
> >> + }
> >> +
> >> + /* fw_cfg with DMA support is necessary to support this device */
> >> + if (!fw_cfg || !fw_cfg_dma_enabled(fw_cfg)) {
> >> + error_setg(errp, "%s device requires fw_cfg",
> >> + TYPE_VMFWUPDATE);
> >> + return;
> >> + }
> >> +
> >> + memset(&s->fw_blob, 0, sizeof(s->fw_blob));
> >> + memset(&s->opaque_blobs, 0, sizeof(s->opaque_blobs));
> >> +
> >> + fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_OBLOB,
> >> + NULL, NULL, s,
> >> + &s->opaque_blobs,
> >> + sizeof(s->opaque_blobs),
> >> + false);
> >> +
> >> + fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_FWBLOB,
> >> + NULL, fw_blob_write, s,
> >> + &s->fw_blob,
> >> + sizeof(s->fw_blob),
> >> + false);
> >> +
> >> + /*
> >> + * Add global capability fw_cfg file. This will be used by the guest
> >> to
> >> + * check capability of the hypervisor.
> >> + */
> >> + s->capability = cpu_to_le16(CAP_VMFWUPD_MASK | VMFWUPDATE_CAP_EDKROM);
> >> + fw_cfg_add_file(fw_cfg, FILE_VMFWUPDATE_CAP,
> >> + &s->capability, sizeof(s->capability));
> >> +
> >> + s->plat_bios_size = get_max_fw_size();
> >> + /* size of bios region for the platform - read only by the guest */
> >> + fw_cfg_add_file(fw_cfg, FILE_VMFWUPDATE_BIOS_SIZE,
> >> + &s->plat_bios_size, sizeof(s->plat_bios_size));
> >> + /*
> >> + * add fw cfg control file to disable the hypervisor interface.
> >> + */
> >> + fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_CONTROL,
> >> + NULL, NULL, s,
> >> + &s->disable,
> >> + sizeof(s->disable),
> >> + false);
> >> + /*
> >> + * This device requires to register a global reset because it is
> >> + * not plugged to a bus (which, as its QOM parent, would reset it).
> >> + */
> >> + qemu_register_reset(fw_update_reset, dev);
> >> +}
> >> +
> >> +static Property vmfwupdate_properties[] = {
> >> + DEFINE_PROP_UINT8("disable", VMFwUpdateState, disable, 0),
> >> + DEFINE_PROP_END_OF_LIST(),
> >> +};
> >> +
> >> +static void vmfwupdate_device_class_init(ObjectClass *klass, void *data)
> >> +{
> >> + DeviceClass *dc = DEVICE_CLASS(klass);
> >> +
> >> + /* we are not interested in migration - so no need to populate
> >> dc->vmsd */
> >> + dc->desc = "VM firmware blob update device";
> >> + dc->realize = vmfwupdate_realize;
> >> + dc->hotpluggable = false;
> >> + device_class_set_props(dc, vmfwupdate_properties);
> >> + set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >> +}
> >> +
> >> +static const TypeInfo vmfwupdate_device_info = {
> >> + .name = TYPE_VMFWUPDATE,
> >> + .parent = TYPE_DEVICE,
> >> + .instance_size = sizeof(VMFwUpdateState),
> >> + .class_init = vmfwupdate_device_class_init,
> >> +};
> >> +
> >> +static void vmfwupdate_register_types(void)
> >> +{
> >> + type_register_static(&vmfwupdate_device_info);
> >> +}
> >> +
> >> +type_init(vmfwupdate_register_types);
> >> diff --git a/include/hw/misc/vmfwupdate.h b/include/hw/misc/vmfwupdate.h
> >> new file mode 100644
> >> index 0000000000..e9229d807b
> >> --- /dev/null
> >> +++ b/include/hw/misc/vmfwupdate.h
> >> @@ -0,0 +1,103 @@
> >> +/*
> >> + * Guest driven VM boot component update device
> >> + * For details and specification, please look at
> >> docs/specs/vmfwupdate.rst.
> >> + *
> >> + * Copyright (C) 2024 Red Hat, Inc.
> >> + *
> >> + * Authors: Ani Sinha <anisinha@redhat.com>
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2 or
> >> later.
> >> + * See the COPYING file in the top-level directory.
> >> + *
> >> + */
> >> +#ifndef VMFWUPDATE_H
> >> +#define VMFWUPDATE_H
> >> +
> >> +#include "hw/qdev-core.h"
> >> +#include "qom/object.h"
> >> +#include "qemu/units.h"
> >> +
> >> +#define TYPE_VMFWUPDATE "vmfwupdate"
> >> +
> >> +#define VMFWUPDCAPMSK 0xffff /* least significant 16 capability bits */
> >> +
> >> +#define VMFWUPDATE_CAP_EDKROM 0x08 /* bit 4 represents support for EDKROM
> >> */
> >> +#define VMFWUPDATE_CAP_BIOS_RESIZE 0x04 /* guests may resize bios region
> >> */
> >> +#define CAP_VMFWUPD_MASK 0x80
> >> +
> >> +#define VMFWUPDATE_OPAQUE_SIZE (1024 * MiB)
> >> +
> >> +/* fw_cfg file definitions */
> >> +#define FILE_VMFWUPDATE_OBLOB "etc/vmfwupdate/opaque-blob"
> >> +#define FILE_VMFWUPDATE_FWBLOB "etc/vmfwupdate/fw-blob"
> >> +#define FILE_VMFWUPDATE_CAP "etc/vmfwupdate/cap"
> >> +#define FILE_VMFWUPDATE_BIOS_SIZE "etc/vmfwupdate/bios-size"
> >> +#define FILE_VMFWUPDATE_CONTROL "etc/vmfwupdate/disable"
> >> +
> >> +/*
> >> + * Address and length of the guest provided firmware blob.
> >> + * The blob itself is passed using the guest shared memory to QEMU.
> >> + * This is then copied to the guest private memeory in the secure vm
> >> + * by the hypervisor.
> >> + */
> >> +typedef struct {
> >> + uint32_t bios_size; /*
> >> + * this is used by the guest to update
> >> plat_bios_size
> >> + * when VMFWUPDATE_CAP_BIOS_RESIZE is set.
> >> + */
> >> + uint64_t bios_paddr; /*
> >> + * starting gpa where the blob is in shared guest
> >> + * memory. Cleared upon system reset.
> >> + */
> >> +} VMFwUpdateFwBlob;
> >> +
> >> +typedef struct VMFwUpdateState {
> >> + DeviceState parent_obj;
> >> +
> >> + /*
> >> + * capabilities - 64 bits.
> >> + * Little endian format.
> >> + */
> >> + uint64_t capability;
> >> +
> >> + /*
> >> + * size of the bios region - architecture dependent.
> >> + * Read-only by the guest unless VMFWUPDATE_CAP_BIOS_RESIZE
> >> + * capability is set.
> >> + */
> >> + uint32_t plat_bios_size;
> >> +
> >> + /*
> >> + * disable - disables the interface when non-zero value is written to
> >> it.
> >> + * Writing 0 to this file enables the interface.
> >> + */
> >> + uint8_t disable;
> >> +
> >> + /*
> >> + * The first stage boot uses this opaque blob to convey to the next
> >> stage
> >> + * where the next stage components are loaded. The exact structure and
> >> + * number of entries are unknown to the hypervisor and the hypervisor
> >> + * does not touch this memory or do any validations.
> >> + * The contents of this memory needs to be validated by the guest and
> >> + * must be ABI compatible between the first and second stages.
> >> + */
> >> + unsigned char opaque_blobs[VMFWUPDATE_OPAQUE_SIZE];
> >> +
> >> + /*
> >> + * firmware blob addresses and sizes. These are moved to guest
> >> + * private memory.
> >> + */
> >> + VMFwUpdateFwBlob fw_blob;
> >> +} VMFwUpdateState;
> >> +
> >> +OBJECT_DECLARE_SIMPLE_TYPE(VMFwUpdateState, VMFWUPDATE);
> >> +
> >> +/* returns NULL unless there is exactly one device */
> >> +static inline VMFwUpdateState *vmfwupdate_find(void)
> >> +{
> >> + Object *o = object_resolve_path_type("", TYPE_VMFWUPDATE, NULL);
> >> +
> >> + return o ? VMFWUPDATE(o) : NULL;
> >> +}
> >> +
> >> +#endif
>
>