qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 05/10] ACPI: Add Virtual Machine Generation I


From: Ben Warren
Subject: Re: [Qemu-devel] [PATCH v5 05/10] ACPI: Add Virtual Machine Generation ID support
Date: Wed, 8 Feb 2017 14:34:11 -0800

> On Feb 7, 2017, at 4:48 PM, Laszlo Ersek <address@hidden> wrote:
> 
> On 02/05/17 10:12, address@hidden <mailto:address@hidden> wrote:
>> From: Ben Warren <address@hidden>
>> 
>> This implements the VM Generation ID feature by passing a 128-bit
>> GUID to the guest via a fw_cfg blob.
>> Any time the GUID changes, an ACPI notify event is sent to the guest
>> 
>> The user interface is a simple device with one parameter:
>> - guid (string, must be "auto" or in UUID format
>>   xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
>> 
>> Signed-off-by: Ben Warren <address@hidden>
>> ---
>> default-configs/i386-softmmu.mak     |   1 +
>> default-configs/x86_64-softmmu.mak   |   1 +
>> hw/acpi/Makefile.objs                |   1 +
>> hw/acpi/vmgenid.c                    | 206 
>> +++++++++++++++++++++++++++++++++++
>> hw/i386/acpi-build.c                 |  10 ++
>> include/hw/acpi/acpi_dev_interface.h |   1 +
>> include/hw/acpi/vmgenid.h            |  37 +++++++
>> 7 files changed, 257 insertions(+)
>> create mode 100644 hw/acpi/vmgenid.c
>> create mode 100644 include/hw/acpi/vmgenid.h
> 
> [snip code]
> 
> So, I'm late to the game. I can't possibly comment on all the concerns
> that have been raised scattered around the thread, exactly *where* they
> have been raised. However, I will try to collect the concerns here.
> 
> 
> (1) Byte swapping for the UUID.
> 
> The idea of QemuUUID is that wherever it is stored in a non-ephemeral
> fashion, it is in BE representation. The guest needs it in LE, so we
> should convert it temporarily -- using a local variable -- just before
> writing it to guest memory, and then forget about the LE representation
> promptly.
> 
> As far as I understand, we all agree on this (Michael, Ben, myself -- I
> think Igor hasn't commented on this).
> 
> 
Sure, will rework.
> (2) Structure of the vmgenid fw_cfg blob, AKA "why that darn offset 40
> decimal".
> 
> I explained it under points (6) and (7) in the following message:
> 
> Message-Id: <address@hidden <mailto:address@hidden>>
> URL: https://www.mail-archive.com/address@hidden/msg425226.html 
> <https://www.mail-archive.com/address@hidden/msg425226.html>
> 
> The story is that *wherever* an ADD_POINTER command points to -- that
> is, at the *exact* target address --, OVMF will look for an ACPI table
> header, somewhat heuristically. If it finds a byte pattern that (a) fits
> into the remaining blob and (b) passes some superficial ACPI table
> header checks, such as length and checksum, then OVMF assumes that the
> blob contains an ACPI table there, and passes the address (again, the
> exact, relocated, absolute target address of ADD_POINTER) to
> EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable().
> 
> We want to disable this heuristic for the vmgenid blob. *If* the blob
> contained only 16 bytes (for the GUID), then the heuristic would
> automatically disable itself, because the ACPI table header (36 bytes)
> is larger than 16 bytes, so OVMF wouldn't even look. However, for the
> caching and other VMGENID requirements, we need to allocate a full page
> with the blob (4KB), hence OVMF *will* look. If we placed the GUID right
> at the start of the page, then OVMF would sanity-check it as an ACPI
> table header. The check would *most likely* not pass, so things would be
> fine in practice, but we can do better than that: just put 40 zero bytes
> at the front of the blob.
> 
> And this is why the ADD_POINTER command has to point to the beginning of
> the blob (i.e., 36 zero bytes), to "suppress the OVMF ACPI SDT header
> detection". (The other 4 bytes are for arriving at an address divisible
> by 8, which is a VMGENID requirement for the GUID.)
> 
> The consequence is that both the ADDR method and QEMU's guest memory
> access code have to add 40 manually.
> 
Igor has recommended that I have the add_pointer() call that patches AML have 
an offset of 40 from the start of the source file, which will result in the 
VGIA AML variable pointing to the GUID, not offset by 40.  I assume this isn’t 
a problem for you, but please confirm.
> 
> (3) Whether the VGIA named object should be a DWORD or a QWORD in ACPI.
> Personally, I'm fine with either, but I see that DWORD is more
> compatible with old OSes.
> 
> What I really care about though is the new WRITE_POINTER command
> structure. That should support 8 bytes as well.
> 
> So this is exactly how Michael summarized it ultimately:
> - use a DWORD for VGIA in ACPI,
> - with a matching 4-byte wide ADD_POINTER command;
> - write the address with a 4-byte wide WRITE_POINTER command into
>  fw_cfg,
> - *but* the WRITE_POINTER command struct should support 8-byte as well,
> - and the fw_cfg file that is written (vmgenid_addr) should have size 8
>  from the start. (The guest will simply write offsets 0..3 inclusive,
>  in LE byte order.)
> 
This is an easy change, WRITE_POINTER already supports 8-byte writes.
> 
> (4) IIRC Igor asked if sizing the blob to 4KB would guarantee that the
> GUID ends up in a cacheable page -- yes (see also Michael's followup).
> This size guarantees that the GUID will have a full page to itself, and
> its allocated from normal guest RAM, as Reserved memory (SeaBIOS, IIRC)
> or EfiACPIMemoryNVS memory (OVMF).
> 
> Note that the VMGENID spec says, in ACPI terms, "It must not be in
> ranges reported as AddressRangeMemory or AddressRangeACPI", but that's fine:
> 
> ACPI speak           UEFI speak            suitable for VMGENID?
> ------------------   --------------------- ---------------------
> AddressRangeMemory   EfiConventionalMemory no
> AddressRangeACPI     EfiACPIReclaimMemory  no
> AddressRangeNVS      EfiACPIMemoryNVS      yes, used by OVMF
> AddressRangeReserved EfiReservedMemoryType yes, used by SeaBIOS
> 
> 
> (5) The race and fw_cfg callbacks.
> 
> (a) Michael and Ben identified a problem where
> - QEMU places the initial GUID in the "vmgenid" fw_cfg blob,
> - the guest downloads it,
> - the host admin changes the GUID via the monitor,
> - and the guest writes the address to "vmgenid_addr" *only* after that.
> 
> In other words, the monitor action occurs between the guest's processing
> of the ALLOCATE and WRITE_POINTER linker/loader commands.
> 
> (b) A similar problem is rebooting the guest.
> - The guest starts up fine,
> - the OS runs,
> - the host admin changes the GUID via the monitor,
> - and the guest sees the update fine.
> - At this point the guest is rebooted (within the same QEMU instance).
> 
> Since the fw_cfg blob is not rebuilt -- more precisely, the blob *is*
> rebuilt, but it is then thrown away in acpi_build_update() --, the guest
> will see a different GUID from the last value set on the monitor. (And
> querying the monitor will actually return that last-set value, which is
> no longer known by the guest.)
> 
> 
> The suggestion for these problems was to add a callback to "one" of the
> fw_cfg files in question.
> 
> Let's investigate the writeable "vmgenid_addr" file first. Here the idea
> is to rewrite the GUID in guest RAM as soon as the address is known from
> the guest, regardless of what GUID the guest placed there originally,
> through downloading the "vmgenid" blob.
> 
> (i) We have no write callbacks at the moment. Before we removed the data
> port based write support in QEMU 2.4, the write callback used to be
> invoked when the end of the fw_cfg blob was reached. Since we intend to
> pack several addresses in the same writeable fw_cfg blob down the road,
> at different offsets, this method wouldn't work; the final offset in the
> blob might not be reached at all. (Another thing that wouldn't work is
> an 8-byte writeable fw_cfg blob, and a 4-byte WRITE_POINTER command that
> only rewrites offsets 0..3 inclusve -- see (3) above.)
> 
> So I don't think that a write callback is viable, at least with the
> "many addresses at different offsets in the same blob" feature. Unless
> we want to register different callbacks (or different callback
> arguments) for different offsets. And then a large write could trigger a
> series of callbacks, even, if it covers several special offsets. This
> looks too complex to me.
> 
> (ii) What we do have now is a select callback. Unfortunately, the guest
> is not required to select the item and atomically rewrite the blob, in
> the same fw_cfg DMA operation. The guest is allowed to do the selection
> with the IO port method, and then write the blob separately with DMA.
> (This is what OVMF does.) IOW, when we'd get the callback (for the
> select), the address would not have been written yet.
> 
> So I don't think that a select callback for the writeable "vmgenid_addr"
> file is viable either.
> 
> (iii) Let's see a select callback for the "vmgenid" blob itself! The
> idea is, whenever the guest selects this blob, immediately refresh the
> blob from the last set GUID. Then let the guest download the
> just-refreshed blob.
> 
I like this approach, and have it working.  I assume by “refresh the blob” you 
mean making a call to fw_cfg_modify_file().  If that is the case, I need to 
modify the function fw_cfg_modify_bytes_read() because of the following 
commented-out problem lines (656:657 in fw_cfg.c):

    /* return the old data to the function caller, avoid memory leak */
    ptr = s->entries[arch][key].data;
    s->entries[arch][key].data = data;
    s->entries[arch][key].len = len;
//    s->entries[arch][key].callback_opaque = NULL;
//    s->entries[arch][key].allow_write = false;

As you can see, this function modifies not only a fw_cfg object’s data, but 
also its metadata, wiping out the callback parameter and forcing it to 
read-only.  I don’t know the history of this function, so want to tread 
lightly.  The second parameter doesn’t impact me, since this fw_cfg object is 
read-only, but the callback parameter one does.

> The main observation here is that we can *fail* QMP commands. That's
> good enough if we can tell *when* to fail them.
> 
> Under this idea, we would fail the "set-vm-generation-id" QMP command
> with "EAGAIN" if the quest had selected the "vmgenid" blob since machine
> reset -- this can be tracked with a latch -- *but* the particular offset
> range in the "vmgenid_addr" blob were still zero.
> 
> This is exactly the window between the ALLOCATE and WRITE_POINTER
> commands where updates from the monitor would be lost. So let's just
> tell the management layer to try again later (like 1 second later). By
> that time, any sane guest will have written the address into "vmgenid_addr”.
> 
This seems sane.  I’ll just return EAGAIN if vmgenid_addr is zero, meaning the 
guest is not fully synchronized yet so don’t accept input.

One other more radical idea I had was to just remove the monitor GUID 
modification capability completely.  The VM Generation ID needs to change only 
in the following situations:
- new VM (command line processed)
- start from snapshot (command line processed, VGIA restored from VmState.
- VM recovered from backup (like a new VM, command line processed).
- VM imported, copied or cloned (like a new VM, command line processed).
- failed over from a disaster recovery environment.  Here I’m not sure. It’s 
possible I suppose that one would have a DR VM in hot standby, but would want 
to change its VM Generation ID in order to force Windows to re-synchronize 
Active Directory, crypto seeds etc.  Here the monitor might be necessary.

> Thanks
> Laszlo

Attachment: smime.p7s
Description: S/MIME cryptographic signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]