qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
Date: Thu, 28 Jan 2016 14:48:09 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 01/28/16 12:13, Michael S. Tsirkin wrote:
> On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
>> Based on Microsoft's specifications (paper can be
>> downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
>> easily found by "Virtual Machine Generation ID" keywords),
>> add a PCI device with corresponding description in
>> SSDT ACPI table.
>>
>> The GUID is set using "vmgenid.guid" property or
>> a corresponding HMP/QMP command.
>>
>> Example of using vmgenid device:
>>  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
>>
>> 'vmgenid' device initialization flow is as following:
>>  1. vmgenid has RAM BAR registered with size of GUID buffer
>>  2. BIOS initializes PCI devices and it maps BAR in PCI hole
>>  3. BIOS reads ACPI tables from QEMU, at that moment tables
>>     are generated with \_SB.VMGI.ADDR constant pointing to
>>     GPA where BIOS's mapped vmgenid's BAR earlier
>>
>> Note:
>> This implementation uses PCI class 0x0500 code for vmgenid device,
>> that is marked as NO_DRV in Windows's machine.inf.
>> Testing various Windows versions showed that, OS
>> doesn't touch nor checks for resource conflicts
>> for such PCI devices.
>> There was concern that during PCI rebalancing, OS
>> could reprogram the BAR at other place, which would
>> leave VGEN.ADDR pointing to the old (no more valid)
>> address.
>> However testing showed that Windows does rebalancing
>> only for PCI device that have a driver attached
>> and completely ignores NO_DRV class of devices.
>> Which in turn creates a problem where OS could remap
>> one of PCI devices(with driver) over BAR used by
>> a driver-less PCI device.
>> Statically declaring used memory range as VGEN._CRS
>> makes OS to honor resource reservation and an ignored
>> BAR range is not longer touched during PCI rebalancing.
>>
>> Signed-off-by: Gal Hammer <address@hidden>
>> Signed-off-by: Igor Mammedov <address@hidden>
> 
> It's an interesting hack, but this needs some thought. BIOS has no idea
> this BAR is special and can not be rebalanced, so it might put the BAR
> in the middle of the range, in effect fragmenting it.
> 
> Really I think something like V12 just rewritten using the new APIs
> (probably with something like build_append_named_dword that I suggested)
> would be much a simpler way to implement this device, given
> the weird API limitations.
> 
> And hey, if you want to use a pci device to pass the physical
> address guest to host, instead of reserving
> a couple of IO addresses, sure, stick it in pci config in
> a vendor-specific capability, this way it'll get migrated
> automatically.

Looks like the bottleneck for this feature is that we can't agree on the
design.

I thought that the ACPI approach was a nice idea; the DataTableRegion
trick made me enthusiastic. Since DataTableRegion won't work, I feel
that the ACPI approach has devolved into too many layers of manual
indirections. (If I recall right, I posted one of the recent updates on
that, with some diagrams).

I think that this investigation has been beaten to death; now we should
just do the simplest thing that works. I think Igor's PCI approach is
simpler, and may be more easily debuggable, than the one ACPI approach
that could *actually* be made work.

I believe it wasn't my original priority about this feature, but at v19,
simplicity of design & implementation looks like the most important
trait. At this point even the fixed MMIO range near the LAPIC range
(whose relocation we don't support anyway, BTW) could be preferable.

Anyway, I've said my piece, I'm outta here. I hope you guys can reach an
agreement.

Thanks
Laszlo

> 
> 
>> ---
>> changes since 17:
>>   - small fixups suggested in v14 review by Michael S. Tsirkin" 
>> <address@hidden>
>>   - make BAR prefetchable to make region cached as per MS spec
>>   - s/uuid/guid/ to match spec
>> changes since 14:
>>   - reserve BAR resources so that Windows won't touch it
>>     during PCI rebalancing - "Michael S. Tsirkin" <address@hidden>
>>   - ACPI: split VGEN device of PCI device descriptor
>>     and place it at PCI0 scope, so that won't be need trace its
>>     location on PCI buses. - "Michael S. Tsirkin" <address@hidden>
>>   - permit only one vmgenid to be created
>>   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
>> ---
>>  default-configs/i386-softmmu.mak   |   1 +
>>  default-configs/x86_64-softmmu.mak |   1 +
>>  docs/specs/pci-ids.txt             |   1 +
>>  hw/i386/acpi-build.c               |  56 +++++++++++++-
>>  hw/misc/Makefile.objs              |   1 +
>>  hw/misc/vmgenid.c                  | 154 
>> +++++++++++++++++++++++++++++++++++++
>>  include/hw/misc/vmgenid.h          |  27 +++++++
>>  include/hw/pci/pci.h               |   1 +
>>  8 files changed, 240 insertions(+), 2 deletions(-)
>>  create mode 100644 hw/misc/vmgenid.c
>>  create mode 100644 include/hw/misc/vmgenid.h
>>
>> diff --git a/default-configs/i386-softmmu.mak 
>> b/default-configs/i386-softmmu.mak
>> index b177e52..6402439 100644
>> --- a/default-configs/i386-softmmu.mak
>> +++ b/default-configs/i386-softmmu.mak
>> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>>  CONFIG_IOAPIC=y
>>  CONFIG_PVPANIC=y
>>  CONFIG_MEM_HOTPLUG=y
>> +CONFIG_VMGENID=y
>>  CONFIG_NVDIMM=y
>>  CONFIG_ACPI_NVDIMM=y
>>  CONFIG_XIO3130=y
>> diff --git a/default-configs/x86_64-softmmu.mak 
>> b/default-configs/x86_64-softmmu.mak
>> index 6e3b312..fdac18f 100644
>> --- a/default-configs/x86_64-softmmu.mak
>> +++ b/default-configs/x86_64-softmmu.mak
>> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>>  CONFIG_IOAPIC=y
>>  CONFIG_PVPANIC=y
>>  CONFIG_MEM_HOTPLUG=y
>> +CONFIG_VMGENID=y
>>  CONFIG_NVDIMM=y
>>  CONFIG_ACPI_NVDIMM=y
>>  CONFIG_XIO3130=y
>> diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
>> index 0adcb89..e65ecf9 100644
>> --- a/docs/specs/pci-ids.txt
>> +++ b/docs/specs/pci-ids.txt
>> @@ -47,6 +47,7 @@ PCI devices (other than virtio):
>>  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
>>  1b36:0006  PCI Rocker Ethernet switch device
>>  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
>> +1b36:0009  PCI VM-Generation device
>>  1b36:000a  PCI-PCI bridge (multiseat)
>>  
>>  All these devices are documented in docs/specs.
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 78758e2..0187262 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -44,6 +44,7 @@
>>  #include "hw/acpi/tpm.h"
>>  #include "sysemu/tpm_backend.h"
>>  #include "hw/timer/mc146818rtc_regs.h"
>> +#include "hw/misc/vmgenid.h"
>>  
>>  /* Supported chipsets: */
>>  #include "hw/acpi/piix4.h"
>> @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
>>      info->applesmc_io_base = applesmc_port();
>>  }
>>  
>> +static Aml *build_vmgenid_device(uint64_t buf_paddr)
>> +{
>> +    Aml *dev, *pkg, *crs;
>> +
>> +    dev = aml_device("VGEN");
>> +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
>> +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
>> +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
>> +
>> +    pkg = aml_package(2);
>> +    /* low 32 bits of UUID buffer addr */
>> +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
>> +    /* high 32 bits of UUID buffer addr */
>> +    aml_append(pkg, aml_int(buf_paddr >> 32));
>> +    aml_append(dev, aml_name_decl("ADDR", pkg));
>> +
>> +    /*
>> +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
>> +     * displays it as "PCI RAM controller" which is marked as NO_DRV
>> +     * so Windows ignores VMGEN device completely and doesn't check
>> +     * for resource conflicts which during PCI rebalancing can lead
>> +     * to another PCI device claiming ignored BARs. To prevent this
>> +     * statically reserve resources used by VM_Gen_Counter.
>> +     * For more verbose comment see this commit message.
> 
> What does "this commit message" mean?
> 
>> +     */
>> +     crs = aml_resource_template();
>> +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>> +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
>> +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
>> +                VMGENID_VMGID_BUF_SIZE));
>> +     aml_append(dev, aml_name_decl("_CRS", crs));
>> +     return dev;
>> +}
>> +
>>  /*
>>   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
>>   * On i386 arch we only have two pci hosts, so we can look only for them.
>> @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
>>              }
>>  
>>              if (bus) {
>> +                Object *vmgen;
>>                  Aml *scope = aml_scope("PCI0");
>>                  /* Scan all PCI buses. Generate tables to support hotplug. 
>> */
>>                  build_append_pci_bus_devices(scope, bus, 
>> pm->pcihp_bridge_en);
>> @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
>>                      aml_append(scope, dev);
>>                  }
>>  
>> +                vmgen = find_vmgneid_dev(NULL);
>> +                if (vmgen) {
>> +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
>> +                    uint64_t buf_paddr =
>> +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
>> +
>> +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
>> +                        aml_append(scope, build_vmgenid_device(buf_paddr));
>> +
>> +                        method = aml_method("\\_GPE._E00", 0,
>> +                                            AML_NOTSERIALIZED);
>> +                        aml_append(method,
>> +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
>> +                                       aml_int(0x80)));
>> +                        aml_append(ssdt, method);
>> +                    }
>> +                }
>> +
>>                  aml_append(sb_scope, scope);
>>              }
>>          }
>> @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
>>      {
>>          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
>>  
>> -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
>> -
>>          if (misc->is_piix4) {
>>              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
>>              aml_append(method,
>> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
>> index d4765c2..1f05edd 100644
>> --- a/hw/misc/Makefile.objs
>> +++ b/hw/misc/Makefile.objs
>> @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
>>  
>>  obj-$(CONFIG_PVPANIC) += pvpanic.o
>>  obj-$(CONFIG_EDU) += edu.o
>> +obj-$(CONFIG_VMGENID) += vmgenid.o
>>  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
>> diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
>> new file mode 100644
>> index 0000000..a2fbdfc
>> --- /dev/null
>> +++ b/hw/misc/vmgenid.c
>> @@ -0,0 +1,154 @@
>> +/*
>> + *  Virtual Machine Generation ID Device
>> + *
>> + *  Copyright (C) 2016 Red Hat Inc.
>> + *
>> + *  Authors: Gal Hammer <address@hidden>
>> + *           Igor Mammedov <address@hidden>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "hw/i386/pc.h"
>> +#include "hw/pci/pci.h"
>> +#include "hw/misc/vmgenid.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "qapi/visitor.h"
>> +
>> +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
>> +
>> +typedef struct VmGenIdState {
>> +    PCIDevice parent_obj;
>> +    MemoryRegion iomem;
>> +    union {
>> +        uint8_t guid[16];
>> +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
>> +    };
>> +    bool guid_set;
>> +} VmGenIdState;
>> +
>> +Object *find_vmgneid_dev(Error **errp)
>> +{
>> +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
>> +    if (!obj) {
>> +        error_setg(errp, VMGENID_DEVICE " is not found");
>> +    }
>> +    return obj;
>> +}
>> +
>> +static void vmgenid_update_guest(VmGenIdState *s)
>> +{
>> +    Object *acpi_obj;
>> +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
>> +
>> +    memcpy(ptr, &s->guid, sizeof(s->guid));
>> +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
>> +
>> +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
>> +    if (acpi_obj) {
>> +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
>> +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
>> +        ACPIREGS *acpi_regs = adevc->regs(adev);
>> +
>> +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
>> +        acpi_update_sci(acpi_regs, adevc->sci(adev));
>> +    }
>> +}
>> +
>> +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
>> +{
>> +    VmGenIdState *s = VMGENID(obj);
>> +
>> +    if (qemu_uuid_parse(value, s->guid) < 0) {
>> +        error_setg(errp, "'%s." VMGENID_GUID
>> +                   "': Failed to parse GUID string: %s",
>> +                   object_get_typename(OBJECT(s)),
>> +                   value);
>> +        return;
>> +    }
>> +
>> +    s->guid_set = true;
>> +    vmgenid_update_guest(s);
>> +}
>> +
>> +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
>> +                                   const char *name, Error **errp)
>> +{
>> +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
>> +
>> +    if (value == PCI_BAR_UNMAPPED) {
>> +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
>> +                   object_get_typename(OBJECT(obj)));
>> +        return;
>> +    }
>> +    visit_type_int(v, &value, name, errp);
>> +}
>> +
>> +static void vmgenid_initfn(Object *obj)
>> +{
>> +    VmGenIdState *s = VMGENID(obj);
>> +
>> +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
>> +                           &error_abort);
>> +
>> +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, 
>> NULL);
>> +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
>> +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
>> +}
>> +
>> +
>> +static void vmgenid_realize(PCIDevice *dev, Error **errp)
>> +{
>> +    VmGenIdState *s = VMGENID(dev);
>> +    bool ambiguous = false;
>> +
>> +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
>> +    if (ambiguous) {
>> +        error_setg(errp, "no more than one " VMGENID_DEVICE
>> +                         " device is permitted");
>> +        return;
>> +    }
>> +
>> +    if (!s->guid_set) {
>> +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
>> +                   object_get_typename(OBJECT(s)));
>> +        return;
>> +    }
>> +
>> +    vmstate_register_ram(&s->iomem, DEVICE(s));
>> +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
>> +        PCI_BASE_ADDRESS_MEM_PREFETCH |
>> +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>> +        &s->iomem);
>> +    return;
>> +}
>> +
>> +static void vmgenid_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> +
>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>> +    dc->hotpluggable = false;
>> +    k->realize = vmgenid_realize;
>> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
>> +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
>> +    k->class_id = PCI_CLASS_MEMORY_RAM;
>> +}
>> +
>> +static const TypeInfo vmgenid_device_info = {
>> +    .name          = VMGENID_DEVICE,
>> +    .parent        = TYPE_PCI_DEVICE,
>> +    .instance_size = sizeof(VmGenIdState),
>> +    .instance_init = vmgenid_initfn,
>> +    .class_init    = vmgenid_class_init,
>> +};
>> +
>> +static void vmgenid_register_types(void)
>> +{
>> +    type_register_static(&vmgenid_device_info);
>> +}
>> +
>> +type_init(vmgenid_register_types)
>> diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
>> new file mode 100644
>> index 0000000..b90882c
>> --- /dev/null
>> +++ b/include/hw/misc/vmgenid.h
>> @@ -0,0 +1,27 @@
>> +/*
>> + *  Virtual Machine Generation ID Device
>> + *
>> + *  Copyright (C) 2016 Red Hat Inc.
>> + *
>> + *  Authors: Gal Hammer <address@hidden>
>> + *           Igor Mammedov <address@hidden>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef HW_MISC_VMGENID_H
>> +#define HW_MISC_VMGENID_H
>> +
>> +#include "qom/object.h"
>> +
>> +#define VMGENID_DEVICE           "vmgenid"
>> +#define VMGENID_GUID             "guid"
>> +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
>> +#define VMGENID_VMGID_BUF_SIZE   0x1000
>> +#define VMGENID_VMGID_BUF_BAR    0
>> +
>> +Object *find_vmgneid_dev(Error **errp);
>> +
>> +#endif
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index dedf277..f4c9d48 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -94,6 +94,7 @@
>>  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
>>  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
>> +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
>>  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
>>  
>>  #define FMT_PCIBUS                      PRIx64
>> -- 
>> 1.8.3.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]