qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Windows does not support DataTableRegion at all [was: docs:


From: Laszlo Ersek
Subject: [Qemu-devel] Windows does not support DataTableRegion at all [was: docs: describe QEMU's VMGenID design]
Date: Sun, 13 Sep 2015 13:56:44 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

As the subject suggests, I have terrible news.

I'll preserve the full context here, so that it's easy to scroll back to
the ASL for reference.

I'm also CC'ing edk2-devel, because a number of BIOS developers should
be congregating there.

On 08/28/15 22:18, Laszlo Ersek wrote:
> Cc: Paolo Bonzini <address@hidden>
> Cc: Gal Hammer <address@hidden>
> Cc: Igor Mammedov <address@hidden>
> Cc: "Michael S. Tsirkin" <address@hidden>
> Signed-off-by: Laszlo Ersek <address@hidden>
> ---
>
> Notes:
>     This is based on the super long private email discussion we had two
>     months ago, plus on the IRL discussion between Michael and myself @ the
>     KVM Forum 2015.
>
>  docs/specs/vmgenid.txt | 343 
> +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 343 insertions(+)
>  create mode 100644 docs/specs/vmgenid.txt
>
> diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
> new file mode 100644
> index 0000000..d4bf132
> --- /dev/null
> +++ b/docs/specs/vmgenid.txt
> @@ -0,0 +1,343 @@
> +Virtual Machine Generation ID Device
> +====================================
> +
> +The Microsoft specification entitled "Virtual Machine Generation ID",
> +maintained at <http://go.microsoft.com/fwlink/?LinkId=260709>, defines an 
> ACPI
> +feature that allows the guest OSPM to recognize when it has been returned "to
> +an earlier point in time", eg. by restoral from snapshot, or by incoming
> +migration. Quoting the spec,
> +
> +    The virtual machine generation ID is a feature whereby the virtual 
> machines
> +    BIOS will expose a new ID. This is a 128-bit, cryptographically random
> +    integer value identifier that will be different every time the virtual
> +    machine executes from a different configuration file-such as executing 
> from
> +    a recovered snapshot, or executing after restoring from backup. [...]
> +
> +The document you are reading now extracts the requirements set forth by the
> +VMGenID spec for hypervisors that intend to provide the feature, and 
> describes
> +QEMU's implementation. The design below targets both SeaBIOS and OVMF as
> +compatible guest firmwares, without any changes to either of them.
> +
> +Requirements
> +------------
> +
> +These requirements are extracted from the "How to implement virtual machine
> +generation ID support in a virtualization platform" section of the
> +specification, dated August 1, 2012.
> +
> +R1a. The generation ID shall live in an 8-byte aligned buffer.
> +
> +R1b. The buffer holding the generation ID shall be in guest RAM, ROM, or 
> device
> +     MMIO range.
> +
> +R1c. The buffer holding the generation ID shall be kept separate from areas
> +     used by the operating system.
> +
> +R1d. The buffer shall not be covered by an AddressRangeMemory or
> +     AddressRangeACPI entry in the E820 or UEFI memory map.
> +
> +R1e. The generation ID shall not live in a page frame that could be mapped 
> with
> +     caching disabled. (In other words, if the generation ID lives in RAM, 
> then
> +     it shall only be mapped as cacheable.)
> +
> +R2 to R5. [These AML requirements are isolated well enough in the Microsoft
> +          specification for us to simply refer to them here.]
> +
> +R6. The hypervisor shall expose a _HID (hardware identifier) object in the
> +    VMGenId device's scope that is unique to the hypervisor vendor.
> +
> +Generation ID buffer design
> +---------------------------
> +
> +QEMU places the generation ID buffer inside a separate fw_cfg blob that is
> +exposed to the guest OS with the ACPI linker/loader.
> +
> +The structure of the blob is as follows. Offsets, sizes and numeric values 
> are
> +given in decimal; furthermore the latter are encoded in little endian.
> +
> +  Offs  Field               Size  Value
> +  ----  ------------------  ----  ------------------------------------
> +     0  System Description    36
> +        Table Header
> +     0    Signature            4                                "UEFI"
> +     4    Length               4                                    62
> +     8    Revision             1                                     1
> +     9    Checksum             1                                     0
> +    10    OEMID                6        ACPI_BUILD_APPNAME6 ("BOCHS ")
> +    16    OEM Table ID         8                            "QEMUPARM"
> +    24    OEM Revision         4                                     1
> +    28    Creator ID           4          ACPI_BUILD_APPNAME4 ("BXPC")
> +    32    Creator Revision     4                                     1
> +
> +    36  UEFI Table            18
> +        Sub-Header
> +    36    Identifier          16  417a5dff-bf4b-4abc-a839-6593bb41f452
> +    52    DataOffset           2                                    54
> +
> +    54  ADDR base pointer      8                                    62
> +  ....................................................................
> +    62  OVMF SDT Header       36                                zeroes
> +        probe suppressor
> +    98  VMGenID alignment      6                                zeroes
> +        padding
> +   104  generation ID         16                       128-bit VMGenID
> +   120  fw_cfg blob         3976                                zeroes
> +        padding
> +  4096  <end of blob>
> +
> +The fw_cfg blob is divided in two parts conceptually (separated by the dotted
> +line in the diagram). The first part, up to and excluding offset 62, is a
> +"UEFI" ACPI Table, governed by the UEFI specification 2.5, Appendix O. The
> +second part is mainly padding, but it also contains the generation ID.
> +
> +The "UEFI" ACPI Table -- in the first part -- is a "normal" ACPI table whose
> +generic header is defined by the ACPI specification, but for which the UEFI
> +spec defines the "UEFI" signature and adds two more fixed fields, 
> "Identifier"
> +and "DataOffset".
> +
> +- The Identifier field carries a 128-bit GUID, and enables firmware
> +  implementors to install several "UEFI" tables with different internal
> +  structures, enabling OSPM to tell them apart based on the (Type-)Identifier
> +  GUID field.
> +
> +  For the purposes of QEMU's VMGenID implementation, we generated a new GUID
> +  with the "uuidgen" utility. It should be different from all other
> +  "Identifier" values, present and future, but otherwise no other software 
> need
> +  be aware of the concrete GUID value we generated.
> +
> +- The DataOffset field is just an offset into the table where the actual
> +  (Identifier-specific) data starts.
> +
> +  For the purposes of QEMU's VMGenID implementation, we simply set it to the
> +  next (QEMU-specific) field, "ADDR base pointer".
> +
> +Linker/loader commands
> +----------------------
> +
> +The name of the fw_cfg blob is "etc/acpi/qemuparam". The ALLOCATE command 
> that
> +instructs the guest firmware to download this fw_cfg blob specifies an
> +alignment of 4096, and the blob will have size 4096 too.
> +
> +An ADD_POINTER command links the "UEFI" ACPI Table at the start of the blob
> +into the RSDT.
> +
> +Another ADD_POINTER command relocates the "ADDR base pointer" field to the
> +absolute address of the "OVMF SDT Header probe suppressor" field, within the
> +same blob.
> +
> +After this relocation, an ADD_CHECKSUM command updates the Checksum field,
> +covering the entire "UEFI" ACPI Table (which extends up to and excluding 
> offset
> +62).
> +
> +Blob behavior under SeaBIOS
> +---------------------------
> +
> +(Most of the complexity in the blob is ignored when the guest firmware is
> +SeaBIOS.)
> +
> +- SeaBIOS's ACPI linker/loader client allocates the blob in normal RAM
> +  (satisfying R1b).
> +
> +- Because the ALLOCATE command prescribes an alignment of 4KB, and the blob's
> +  size is also 4KB, the allocation covers a standalone page frame in full
> +  (satisfying R1e).
> +
> +- The 128-bit VMGenID field is located at offset 104 within that page,
> +  resulting in a guest-physical address divisible by 8 (satisfying R1a).
> +
> +- The blob is marked as Reserved in the E820 map (satisfying R1c and R1d).
> +
> +- The "UEFI" ACPI Table at the start of the blob is linked into the RSDT,
> +  in-place.
> +
> +- The "ADDR" AML method (see later) is allowed to refer to the "UEFI" ACPI
> +  Table with the DataTableRegion operator, because the table is located in
> +  memory marked as AddressRangeReserved.
> +
> +- The "ADDR base pointer" field points at "OVMF SDT Header probe suppressor",
> +  which is right after the "UEFI" ACPI Table inside the blob. At OSPM 
> runtime,
> +  the "ADDR" AML method reads the "ADDR base pointer" field, and adds 42, to
> +  arrive at the address of the VMGenID field.
> +
> +  blob @ page offset 0              RSDT
> +  +-----------------------+         +-----+
> +  | "UEFI" ACPI Table <---------+   | ... |
> +  | +-------------------+ |     |   | ... |
> +  | | ...               | |     +---- ... |
> +  | | ...               | |         +-----+
> +  | | ADDR base pointer -----+
> +  | +-------------------+ |  |
> +  | probe suppressor <-------+
> +  | VMGenID @ offset 104  |
> +  | padding               |
> +  +-----------------------+
> +
> +Blob behavior under OVMF
> +------------------------
> +
> +The complexity in the blob is required by the two-pass nature of OVMF's ACPI
> +linker/loader client, which in turn comes from the fact that OVMF has to
> +dissect blobs into individual ACPI tables vs. "other things", tracking the
> +ADD_POINTER commands, so that tables can be installed individually, with
> +EFI_ACPI_TABLE_PROTOCOL.
> +
> +- OVMF's ACPI linker/loader client allocates the blob in normal RAM 
> (satisfying
> +  R1b).
> +
> +- Because the ALLOCATE command prescribes an alignment of 4KB, and the blob's
> +  size is also 4KB, the allocation covers a standalone page frame in full
> +  (satisfying R1e).
> +
> +- The 128-bit VMGenID field is located at offset 104 within that page,
> +  resulting in a guest-physical address divisible by 8 (satisfying R1a).
> +
> +- OVMF's ACPI linker/loader allocates the blob in EfiACPIMemoryNVS type 
> memory,
> +  therefore it is marked as such in the UEFI memmap (satisfying R1c and R1d).
> +
> +- OVMF identifies the "UEFI" ACPI Table at the start of the blob in the 
> second
> +  pass, following the ADD_POINTER command that is meant to link the table 
> into
> +  the RSDT. OVMF installs a *copy* of the "UEFI" ACPI Table with
> +  EFI_ACPI_TABLE_PROTOCOL (linking the copy into both RSDT and XSDT). Given 
> the
> +  "UEFI" signature of the table, EFI_ACPI_TABLE_PROTOCOL places the copy of 
> the
> +  table in EfiACPIMemoryNVS type memory.
> +
> +- The "ADDR" AML method (see later) is allowed to refer to the "UEFI" ACPI
> +  Table with the DataTableRegion operator, because the table is located in
> +  memory marked as AddressRangeNVS.
> +
> +- The "ADDR base pointer" field inside the installed table points at "OVMF 
> SDT
> +  Header probe suppressor" in the original blob. Because this field is filled
> +  with zeros, OVMF's table identification heuristics unconditionally reports 
> a
> +  negative when it tracks the relevant ADD_POINTER command to it in the 
> second
> +  pass. Therefore the blob is marked as "hosts something else than just ACPI
> +  tables", and it is preserved permanently (in the same EfiACPIMemoryNVS type
> +  memory where it has been originally allocated).
> +
> +  At OSPM runtime, the "ADDR" AML method reads the "ADDR base pointer" field,
> +  and adds 42, to arrive at the address of the VMGenID field.
> +
> +  blob @ page offset 0               RSDT         XSDT
> +  +-----------------------------+    +-----+      +-----+
> +  | "UEFI" ACPI Table (in blob) |    | ... |      | ... |
> +  | +-------------------------+ |    | ... ---+   | ... ---------------+
> +  | |XXXXXXXXXXXXXXXXXXXXXXXXX| |    +-----+  |   +-----+              |
> +  | |XXXXXXX [unused] XXXXXXXX| |             |                        |
> +  | |XXXXXXXXXXXXXXXXXXXXXXXXX| |             +------------------------+
> +  | +-------------------------+ |                                      |
> +  | probe suppressor <-------------+  "UEFI" ACPI Table (installed) <--+
> +  | VMGenID @ offset 104        |  |  +---------------------------+
> +  | padding                     |  |  | ...                       |
> +  +-----------------------------+  |  | ...                       |
> +                                   +--- ADDR base pointer         |
> +                                      +---------------------------+
> +
> +ACPI device, control methods
> +----------------------------
> +
> +Requirements R2 through R6 of the VMGenID specification are satisfied with 
> the
> +following ACPI logic, exposed by QEMU's ACPI generator in one of the SSDTs, 
> and
> +installed by both guest firmwares as such.
> +
> +The basic idea is that, when the appropriate guest driver calls the ADDR 
> method
> +(see R4), OSPM locates the generation ID field in the 4KB blob that lives in
> +E820 Reserved (SeaBIOS) or EfiACPIMemoryNVS type (OVMF) memory. The
> +guest-physical address of the field is communicated to QEMU via IO ports
> +[0x512..0x519] inclusive. Then QEMU is cued through IO port 0x51A to refresh
> +(and keep refreshing when appropriate) the generation ID at the passed back
> +address. Finally, the method returns the address to the guest driver too, in
> +the format required by R4.
> +
> +    Scope(\_SB) {
> +        Device (VMGI) {
> +            /* satisfy R2 */
> +            Name (_CID, "VM_Gen_Counter")
> +
> +            /* satisfy R3 */
> +            Name (_DDN, "VM_Gen_Counter")
> +
> +            /* satisfy R6 */
> +            Name (_HID, "QEMU0002")
> +
> +            /* the device owns this IO port range */
> +            Name (_CRS, ResourceTemplate () {
> +                IO (Decode16, 0x512, 0x512, 1, 9)
> +            })
> +
> +            /* Device status: present, enabled & decoding resources, should 
> be
> +             * shown in the UI, functioning properly.
> +             */
> +            Name (_STA, 0xF)
> +
> +            /* Satisfy R4.
> +             *
> +             * This method is serialized because it creates named objects.
> +             */
> +            Method (ADDR, 0, Serialized) {
> +                /* The 8-byte integer field defined as ADBP below is the
> +                 * "ADDR base pointer" field in the UEFI ACPI Table.
> +                 *
> +                 * The DataTableRegion() operator locates that ACPI table by
> +                 * scanning the RSDT/XSDT using the (SignatureString,
> +                 * OemIDString, OemTableIDString) triplet as key.
> +                 *
> +                 * Windows XP would normally crash on the DataTableRegion()
> +                 * operator, but it never calls the ADDR method, hence it 
> never
> +                 * reaches or evaluates DataTableRegion().
> +                 */
> +                DataTableRegion (TBLR, "UEFI", "BOCHS", "QEMUPARM")
> +                Field (TBLR, AnyAcc, NoLock, Preserve) {
> +                  Offset (54),
> +                  ADBP, 64
> +                }
> +
> +                /* This is the IO port range exposed in the _CRS above.
> +                 *
> +                 * The first two 4-byte ports are used to communicate the
> +                 * 64-bit guest-physical address of the actual (relocated)
> +                 * 128-bit generation ID field to QEMU, in little endian
> +                 * encoding, so that QEMU can rewrite that field in guest 
> RAM.
> +                 *
> +                 * A write to last 1-byte port signals that the address has
> +                 * been written fully, and QEMU is free to dereference it.
> +                 */
> +                OperationRegion (VMGR, SystemIO, 0x512, 9)
> +                Field (VMGR, DWordAcc, NoLock, Preserve) {
> +                    PTLO, 32,
> +                    PTHI, 32,
> +                    AccessAs (ByteAcc),
> +                    DONE, 8
> +                }
> +
> +                /* The ADBP field points to the "OVMF SDT Header probe
> +                 * suppressor" area in the blob, at offset 62. In order to
> +                 * arrive at the generation ID field at offset 104, we must 
> add
> +                 * 42 dynamically.
> +                 *
> +                 * The RESU buffer below will contain the result of the
> +                 * addition. The ADFU field exposes it as an 8-byte integer
> +                 * (for storing the sum), while the ADLO and ADHI fields 
> enable
> +                 * us to access the result in two separate 4-byte integers.
> +                 * This exact integer width is especially important for
> +                 * composing the package object that the ADDR method must
> +                 * return.
> +                 */
> +                Name (RESU, Buffer (8) {})
> +                CreateQWordField (RESU, 0, ADFU)
> +                CreateDWordField (RESU, 0, ADLO)
> +                CreateDWordField (RESU, 4, ADHI)
> +
> +                Add (ADBP, 42, ADFU)
> +                Store (ADLO, PTLO)
> +                Store (ADHI, PTHI)
> +                Store (0, DONE)
> +                Return (Package (2) { ADLO, ADHI })
> +            }
> +        }
> +    }
> +
> +    /* satisfy R5 */
> +    Scope (\_GPE) {
> +        Method (_E04) {
> +            Notify (\_SB.VMGI, 0x80)
> +        }
> +    }
>

In the last few days, I implemented all of the above (ie. all the ACPI
stuff, but not the actual device model). I was very excited about the
ACPI part, and figured I'd do that first, to see if Windows would like
it, and if so, do the device model as second (reusing as much of Gal's
work as possible).

I intend to post my current patches later, only as an FYI, for
posterity. I validated them extensively *outside of Windows* with the
following methods:

- I used the OVMF debug log to sanity check the linker/loader commands'
  execution, and to glean the actual allocation address of the fw_cfg
  blob containing the UEFI ACPI Data Table and the generation ID field.

- I used a Fedora guest to validate whether both the original allocation
  of the blob, and the UEFI ACPI Data Table separately installed with
  EFI_ACPI_TABLE_PROTOCOL, were in AcpiNVS type memory, according to the
  UEFI memory map. They were.

- I checked the dmesg.

- I used the ACPICA instance, wrapped with Python, and built into GRUB,
  from the BITS ISO (http://biosbits.org/download/), for dynamic
  verification.

  I've recently learned about this incredible tool from
  <http://lwn.net/Articles/655992/>. The killer feature of ACPI testing
  with BITS is that there is no interference from any OS at all, because
  there *is* no OS. BITS runs in a pure firmware environment, both on
  UEFI and on legacy BIOS -- it is ACPICA in Python in GRUB on top of
  the firmware.

  In a nutshell (hat tip to Josh Triplett for usage help), you boot the
  ISO, then select "Python Interpreter" from the GURB menu, then do the
  following:

  >>> import acpi
  >>> acpi.get_objpaths("ADDR")
  >>> acpi.evaluate("\\_SB_.VMGI.ADDR")

  The second command will locate the ADDR control method with full path
  (it performs a "wildcard" search in the ACPI namespace if you like),
  and the third command will actually call and evaluate that method, and
  print the return value.

  ("Unsafe" ioport access is not done by default, but one can pass the
  "unsafe_io=True" keyword arg to the Python method, and then the ioport
  accesses are done too. In any case, I didn't use that for now, because
  the actual device model was still missing, as I've said.)

  The above test proved that my code (and the design behind it) were
  sound; the UEFI ACPI Data Table was looked up just fine, the ADBP
  field was read out of it, incremented by 42 in AML, and the absolute
  address of the final generation ID field (which I cross-referenced
  with the OVMF debug log -- see the allocation address of the blob
  above) was *right*.

Pop the champagne, correct? Not so fast. I booted Windows Server 2012 R2
on top of this thing... and Device Manager showed the familiar yellow
triangle on the "Microsoft Hyper-V Generation Counter" system device.

I collected as much data as I could from Device Manager:

> General
> -------
> This device cannot find enough free resources that it can use. (Code
> 12)
>
> If you want to use this device, you will need to disable one of the
> other devices on this system.
>
> Events
> ------
> * Device configured (wgencounter.inf)
> Device ACPI\QEMU0002\2&daba3ff&1 was configured.
>
> Driver Name: wgencounter.inf
> Class Guid: {4D36E97D-E325-11CE-BFC1-08002BE10318}
> Driver Date: 06/21/2006
> Driver Version: 6.3.9600.16384
> Driver Provider: Microsoft
> Driver Section: VmGenCounter.NT
> Driver Rank: 0xFF2001
> Matching Device Id: VM_Gen_Counter
> Outranked Drivers:
> Device Updated: false
>
> * Device not started (gencounter)
> Device ACPI\QEMU0002\2&daba3ff&1 had a problem starting.
>
> Driver Name: wgencounter.inf
> Class Guid: {4D36E97D-E325-11CE-BFC1-08002BE10318}
> Service: gencounter
> Lower Filters:
> Upper Filters:
> Problem: 0xC
> Status: 0x0
>
> Resources
> ---------
> I/O Range 0512-051A
> Conflicting device list:
> No conflicts.
>
> Details
> -------
> *Problem status
> {Conflicting Address Range}
> The specified address range conflicts with the address space.
> C0000018

Note that there were *no* conflicting devices!

At this point I applied Gal's v15 from the mailing list archive to
Peter's one merge commit on master that just preceded Gal's posting of
v15 in April. That commit was 87a8adc087, and Gal's v15 applied to it
just fine. Then I rebased Gal's v15 to current master (where my own work
was based off of), namely commit fc04a730b7, fixed up the conflicts,
built it and tested it.

As I expected (because I was sure Gal had tested his work), the yellow
triangle was *not* there. So what was the difference? Well, with gradual
elimination of stuff, and parallel tweaking of Gal's work and mine, I
managed to meet in the middle, and ascertained the following facts:

- If you have no _CRS for VMGENID at all, that's fine.

- If you have a _CRS with nothing in it, that's fine too.

- If you have a _CRS with just a fixed memory32 descriptor in it, that's
  cool.

- If you have an IO descriptor in the _CRS (regardless of a present or
  absent memory descriptor "neighbor"), then the vmgenid driver in
  Windows chokes. In other words, exposing a truthful _CRS was "too
  honest" for Windows' VMGENID driver; we must not admit that we use IO
  ports in the ADDR method.

(Side note: this is not mentioned in the VMGENID spec...)

Anyhow, problem solved, right? Not so fast. When I removed the _CRS, the
error quoted above ("Conflicting Address Range") went away, but another
one popped up: "POWER FAILURE" (yellow triangle again).

(Side note again: I reproduced the same POWER FAILURE error with a
*SeaBIOS* Windows Server 2012 R2 guest as well. So it was not specific
to OVMF.)

I continued the tweaking and gradual isolation (I even tried to bump the
DSDT revision to 2, but that was useless too). Ultimately I was forced
to conclude that it was the DataTableRegion() operator that caused
Windows Server 2012 R2 to reject the device.

At this point let me remind you guys of my extensive, successful testing
of the code with ACPICA (which included BITS and a Fedora guest).
Knowing that Windows had its own ACPI interpreter, I started suspecting
that even *modern* Windows lacked support for DataTableRegion(). But, I
wanted more proof than "it works with ACPICA".

It turns out that you can debug ACPI with WinDbg. If the debuggee is a
debug checked build of Windows, or you install the debug checked
ACPI.SYS separately on a non-debug checked build of Windows, then
ACPI.SYS itself will include the AML debug agent, and you will be able
to talk to it from WinDbg.

I happened to have a debug checked build of Windows 8 lying around, and
it also reproduced the POWER FAILURE error, so it was ideal for the
debuggee role.

(1) In the debugger virtual machine (windows server 2012 r2), I
    installed WinDbg.

    https://msdn.microsoft.com/en-us/library/windows/hardware/ff551063

    "If you want to download only Debugging Tools for Windows, install
    the Windows SDK, and, during the installation, select the Debugging
    Tools for Windows box and clear all the other boxes."

    I also configured the symbol servers (to download the debug symbols
    from). I don't have the exact URLs that explain this, but it's easy
    to google and do.

(2) In the debuggee, I enabled kernel debugging, on the COM2 serial
    port. Note that the following command must be run from "cmd.exe" (as
    administrator); PowerShell will *not* work:

    bcdedit /dbgsettings serial debugport:2 baudrate:38400

    
https://msdn.microsoft.com/en-us/library/windows/hardware/ff542187%28v=vs.85%29.aspx

    I chose 38400 as baud rate, because in the past I tried to use
    another debugger (the UDK debugger) through the virtual serial ports
    of QEMU, and the serial port timings were so distant from those of a
    physical serial port that the UDK debugger could never complete the
    handshake. So I intended to decrease "jitter" by choosing a
    relatively low baud rate.

    Further, I chose COM2 because all my guests are managed by libvirt,
    and libvirt kinda-sorta insists on associating COM1 with the
    "console" of the virtual machine, for historical reasons. It can be
    overridden from the command line with "virsh console --device", but
    I just wanted it to be separate, hence COM2. (Hat tip to Dan
    Berrange for explaining it to me earlier; I'm sure I paraphrased him
    incorrectly now. Sorry.)

(3) So, in the domain XML of the debuggee, I added

    <serial type='tcp'>
      <source mode='connect' host='127.0.0.1' service='12345'/>
      <protocol type='raw'/>
      <target port='1'/>
    </serial>

    (Hat tip again to DanPB for recommending TCP; it avoids any SELinux
    issues that pop up with Unix domain sockets.)

    The debuggee VM is the one that connects, because the debugger
    connection must me made early on during the boot process, and it is
    the debuggee that you might have to restart frequently. So it is the
    one that should initiate connections, and the long-running debugger
    VM should be there listening all the time.

(4) In the domain XML of the debugger, I added

    <serial type='tcp'>
      <source mode='bind' host='127.0.0.1' service='12345'/>
      <protocol type='raw'/>
      <target port='1'/>
    </serial>

That much about the configuration.

Now this is the interesting part. WinDbg has two modes of operation
while debugging a kernel:

https://msdn.microsoft.com/en-us/library/windows/hardware/ff558869%28v=vs.85%29.aspx

- "KD" mode:
  - you can call the AMLI (= AML debugger) *extensions* with the !amli
    prefix,
  - you can't call any AMLI *commands*,
  - you can call all KD commands,

- "AMLI" mode:
  - you can call the AMLI extensions *without* the !amli prefix,
  - you can call the AMLI commands without any prefixes too,
  - you cannot call KD commands; first you have to exit back to KD with
    the "q" command.

So, in the debugger VM, I started WinDbg, selected File | Kernel Debug,
specified COM2 and 38400, and clicked OK.

Then I started the debuggee VM. Here's the debug session transcript:

> ************* Symbol Path validation summary **************
> Response                         Time (ms)     Location
> Deferred                                       
> SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
> Waiting to reconnect...
> Connected to Windows 8 9200 x64 target at (Sun Sep 13 11:10:57.162 2015 (UTC 
> + 2:00)), ptr64 TRUE
> Kernel Debugger connection established.
>
> ************* Symbol Path validation summary **************
> Response                         Time (ms)     Location
> Deferred                                       
> SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
> Symbol search path is: 
> SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
> Executable search path is:
> Windows 8 Kernel Version 9200 MP (1 procs) Checked x64
> Built by: 9200.16384.amd64chk.win8_rtm.120725-1247
> Machine Name:
> Kernel base = 0xfffff800`76a02000 PsLoadedModuleList = 0xfffff800`77145ac0
> System Uptime: 0 days 0:00:00.041 (checked kernels begin at 49 days)
> nt!DebugService2+0x5:
> fffff800`76c34565 cc              int     3

This is when the connection between debugger and debuggee is
established, and the debug agent breaks into WinDbg as soon as possible.

At this point I tried to set the "verboseon", "traceon", and "errbrkon"
AMLI Debugger options (see
<https://msdn.microsoft.com/en-us/library/windows/hardware/ff538158%28v=vs.85%29.aspx>),
but the AMLI debug agent (part of ACPI.SYS on the debuggee) was not
available yet:

> kd> !amli set verboseon
> AMLI_DBGERR: failed to to get the address of ACPI!gDebugger 0
> AMLI_DBGERR: failed to to get the address of ACPI!gdwfAMLIInit 0

So I just let the debuggee continue, with the "g" command:

> kd> g

but right after entering that command, I clicked Debug | Break too.

Namely, if you allow the debuggee to continue from this point
undisturbed, it will *not* break into the debugger when the "error" in
the ADDR control method is encountered. It will simply boot to the full
GUI instead.

However, by clicking Debug | Break immediately, I *queued* a break for
the earliest next possibility. Which is not "right after"; the debuggee
actually runs for tens of seconds uninterrupted after the "g" command
above, before the break occurs. This is it (note "first chance"):

> Break instruction exception - code 80000003 (first chance)
> *******************************************************************************
> *                                                                             
> *
> *   You are seeing this message because you pressed either                    
> *
> *       CTRL+C (if you run console kernel debugger) or,                       
> *
> *       CTRL+BREAK (if you run GUI kernel debugger),                          
> *
> *   on your debugger machine's keyboard.                                      
> *
> *                                                                             
> *
> *                   THIS IS NOT A BUG OR A SYSTEM CRASH                       
> *
> *                                                                             
> *
> * If you did not intend to break into the debugger, press the "g" key, then   
> *
> * press the "Enter" key now.  This message might immediately reappear.  If it 
> *
> * does, press "g" and "Enter" again.                                          
> *
> *                                                                             
> *
> *******************************************************************************
> nt!DbgBreakPointWithStatus:
> fffff800`76c34510 cc              int     3

Cool. So now I *can* set those AMLI debugger options:

> kd> !amli set verboseon
>
> kd> !amli set traceon
>
> kd> !amli set errbrkon

And then I allow the debuggee to continue.

> kd> g

It will print a bunch of debug messages that are not relevant now (side
giggle: notice all those virtio-win messages?)

> KDTARGET: Refreshing KD connection
> [ParaNdis_DebugInitialize] Crash callback registered
> [DriverEntry]=>
> Sep 26 2013 09:53:55built 6.30
> [ParaNdis6_AddDevice]=>
> [ParaNdis6_StartDevice]=>
> [ParaNdis6_Initialize]=>
> [ParaNdis_InitializeContext]=>
> Found IO ports at 0000C0C0(32)
> Found Interrupt vector -8, level 65536, affinity 0, flags 7
> Found Interrupt vector -9, level 65536, affinity 0, flags 7
> Found Interrupt vector -10, level 65536, affinity 0, flags 7
> [ParaNdis_InitializeContext] Message interrupt assigned
> [ParaNdis_ResetVirtIONetDevice] Done
> VirtIO Host Feature VIRTIO_NET_F_CSUM
> VirtIO Host Feature VIRTIO_NET_F_GUEST_CSUM
> VirtIO Host Feature VIRTIO_NET_F_MAC
> VirtIO Host Feature VIRTIO_NET_F_GSO
> VirtIO Host Feature VIRTIO_NET_F_GUEST_TSO4
> VirtIO Host Feature VIRTIO_NET_F_GUEST_TSO6
> VirtIO Host Feature VIRTIO_NET_F_GUEST_ECN
> VirtIO Host Feature VIRTIO_NET_F_GUEST_UFO
> VirtIO Host Feature VIRTIO_NET_F_HOST_TSO4
> VirtIO Host Feature VIRTIO_NET_F_HOST_TSO6
> VirtIO Host Feature VIRTIO_NET_F_HOST_ECN
> VirtIO Host Feature VIRTIO_NET_F_HOST_UFO
> VirtIO Host Feature VIRTIO_NET_F_MRG_RXBUF
> VirtIO Host Feature VIRTIO_NET_F_STATUS
> VirtIO Host Feature VIRTIO_NET_F_CTRL_VQ
> VirtIO Host Feature VIRTIO_NET_F_CTRL_RX
> VirtIO Host Feature VIRTIO_NET_F_CTRL_VLAN
> VirtIO Host Feature VIRTIO_NET_F_CTRL_RX_EXTRA
> VirtIO Host Feature VIRTIO_NET_F_CTRL_MAC_ADDR
> VirtIO Host Feature VIRTIO_F_INDIRECT
> VirtIO Host Feature VIRTIO_F_PUBLISH_INDICES
> [ParaNdis_InitializeContext] Link status on driver startup: 1
> Permanent device MAC: 52-54-00-59-1d-3e
> No valid MAC configured
> Actual MAC: 52-54-00-59-1d-3e
> [ParaNdis_InitializeContext]<=0x0
> [ParaNdis_FinishInitialization]=>
> [ParaNdis_FinishSpecificInitialization]=>
> [ParaNdis_FinishSpecificInitialization] SG recommended size 808
> [ConfigureMSIXVectors] Using MSIX interrupts (3 messages, irql 10)
> [ConfigureMSIXVectors] MSIX message0=000049A2=>FEE0300C
> [ConfigureMSIXVectors] MSIX message1=00004992=>FEE0300C
> [ConfigureMSIXVectors] MSIX message2=00004982=>FEE0300C
> [ConfigureMSIXVectors] Using message 0 for RX queue
> [ConfigureMSIXVectors] Using message 1 for TX queue
> [ConfigureMSIXVectors] Using message 2 for controls
> [ApplyOffloadConfiguration] Requested: V4:IPCS=TxRx,TCPCS=TxRx,UDPCS=TxRx 
> V6:TCPCS=TxRx,UDPCS=TxRx
> [SetOffloadField] IN TxIPChecksum TX: current=-1 Supported TxRx Requested TxRx
> [SetOffloadField] OUT TxIPChecksum TX: new=1 (accepted)
> [SetOffloadField] IN TxTCPChecksum TX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT TxTCPChecksum TX: new=1 (accepted)
> [SetOffloadField] IN TxUDPChecksum TX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT TxUDPChecksum TX: new=1 (accepted)
> [SetOffloadField] IN RxIPChecksum RX: current=-1 Supported TxRx Requested TxRx
> [SetOffloadField] OUT RxIPChecksum RX: new=1 (accepted)
> [SetOffloadField] IN RxTCPChecksum RX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT RxTCPChecksum RX: new=1 (accepted)
> [SetOffloadField] IN RxUDPChecksum RX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT RxUDPChecksum RX: new=1 (accepted)
> [SetOffloadField] IN TxTCPv6Checksum TX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT TxTCPv6Checksum TX: new=1 (accepted)
> [SetOffloadField] IN TxUDPv6Checksum TX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT TxUDPv6Checksum TX: new=1 (accepted)
> [SetOffloadField] IN RxTCPv6Checksum RX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT RxTCPv6Checksum RX: new=1 (accepted)
> [SetOffloadField] IN RxUDPv6Checksum RX: current=-1 Supported TxRx Requested 
> TxRx
> [SetOffloadField] OUT RxUDPv6Checksum RX: new=1 (accepted)
> [ApplyOffloadConfiguration] Result: TCPv4:TxRx,UDPv4:TxRx,IPCS:TxRx 
> TCPv6:TxRx,UDPv6:TxRx
> [ApplyOffloadConfiguration] Result: LSO: v4 -1, v6 -1
> [ApplyOffloadConfiguration] Final: the request accepted
> [ParaNdis_FinishSpecificInitialization]<=0x0
> [ParaNdis_VirtIONetInit]=>
> VirtIODeviceQueryQueueAllocation: queue 0 requires 0x4000 for 256 entries
> VirtIODeviceQueryQueueAllocation: queue 1 requires 0x4000 for 256 entries
> VirtIODeviceQueryQueueAllocation: queue 2 requires 0x3000 for 64 entries
> Creating new virtqueue>>> size 256, pages FFFFF880032C1000
> [VirtIODevicePrepareQueue] queue phys.address 00000000:4f6a0000, pfn 4f6a0
> Creating new virtqueue>>> size 256, pages FFFFF880032C5000
> [VirtIODevicePrepareQueue] queue phys.address 00000000:4f69c000, pfn 4f69c
> Creating new virtqueue>>> size 64, pages FFFFF880032C9000
> [VirtIODevicePrepareQueue] queue phys.address 00000000:4f6bd000, pfn 4f6bd
> IntelPPM.sys: RegisterIdleStates() Failed! rc=0xc00000bb
> IntelPPM.sys: RegisterIdleStates() Failed! rc=0xc00000bb
> [PrepareTransmitBuffers] available 128 Tx descriptors, 256 hw buffers
> [PrepareReceiveBuffers] MaxReceiveBuffers 256
> Indicating Connected
> [ParaNdis_FinishInitialization]<=0x0
> [ParaNdis6_Restart]=>
> [ParaNdis_OnPnPEvent]=>
> [ParaNdis_OnPnPEvent] (NdisDevicePnPEventPowerProfileChanged)
>
> AMLI(? for help)->

Aaaand there we go. Because I set "errbrkon", "any AML error [would]
cause ACPI to break into the AMLI Debugger".

Then I called the "r" AMLI debugger extension (without the !amli prefix
because the mode was already "AMLI", not "KD"). It prints the ACPI
context
<https://msdn.microsoft.com/en-us/library/windows/hardware/ff562102%28v=vs.85%29.aspx>.

> r
>
> Context=fffffa800415d000*, Queue=0, ResList=fffffa800415d1c0
> ThreadID=fffffa8000fb5640, Flags=00000010, 
> pbOp=fffffa8004057192:[\_SB.VMGI.ADDR+1]
> StackTop=fffffa800416ce30, UsedStackSize=464 bytes, FreeStackSize=64560 bytes
> LocalHeap=fffffa800415d168, CurrentHeap=fffffa800415d168, UsedHeapSize=152 
> bytes
> Object=\_SB.VMGI.ADDR, Scope=\_SB.VMGI.ADDR, ObjectOwner=fffffa800415d1e0, 
> SyncLevel=0
> AsyncCallBack=fffff88000e5ce9c, CallBackData=fffff8a003458230, 
> CallBackContext=fffff88001794c90
>
> MethodObject=\_SB.VMGI.ADDR
> fffffa800416ceb8: Local0=Unknown()
> fffffa800416cee0: Local1=Unknown()
> fffffa800416cf08: Local2=Unknown()
> fffffa800416cf30: Local3=Unknown()
> fffffa800416cf58: Local4=Unknown()
> fffffa800416cf80: Local5=Unknown()
> fffffa800416cfa8: Local6=Unknown()
> fffffa800416cfd0: Local7=Unknown()
> fffffa800415d080: RetObj=Unknown()
>
> Resources Owned:
>   ResType=Mutex, ResObj=fffffa80040570e0
>
> Next AML Pointer: fffffa8004057192:[\_SB.VMGI.ADDR+1]
>
> fffffa8004057192 : Index(TBLR, "UEFI", AMLI_DBGERR: UnAsmSuperName: invalid 
> SuperName - 0x0d

Okay, so we are in \_SB.VMGI.ADDR, at byte offset one; there was an
error that has dropped us into the debugger; and the debugger
disassembles the offending AML as...

    Index(TBLR, "UEFI", ...

WHAT?! The code has no Index() operator whatsoever!

Also:

    ... AMLI_DBGERR: UnAsmSuperName: invalid SuperName - 0x0d

why does the AML disassembler report a byte stream decoding failure
*inside* the (in its own right unfathomable) Index() operator?!

Now, refer to the ACPI specification, revision 2.0 (July 27, 2000), for
seeing how the Index() ASL operator is compiled:

> 17.2.4.4 Type 2 Opcodes Encoding
>
> DefIndex  := IndexOp BuffPkgStrObj IndexValue Target
> IndexOp   := 0x88

Uh-oh. Now let's see how DataTableRegion() is compiled (note: my QEMU
code correctly generates that AML, and ACPICA agrees!):

> 17.2.4.2 Named Objects Encoding
>
> DefDataRegion := DataRegionOp NameString TermArg TermArg TermArg
> DataRegionOp  := ExOpPrefix 0x88

(On the side we have to note that there's a typo in the spec here (it
should be Ex[t]OpPrefix, not Ex[]OpPrefix), and that this typo lives on
in ACPI v6.0.)

Anyway, for completeness:

> 17.2.2 Data Objects Encoding
>
> ExtOpPrefix := 0x5b

Do you see it? DataRegionOp is encoded with 0x88, and IndexOp is *also*
encoded with 0x88. However, the former is *preceded* by 0x5b; this is
how AML interpreters are supposed to distinguish DataRegionOp from
IndexOp.

And while ACPICA does distinguish them, Windows does not.

ACPI.SYS happily strolls over the ExtOpPrefix (0x5b) that is at offset
#0 of the ADDR method's body, then misinterprets the following 0x88 as
an IndexOp. Then it believes to see a malformed DefIndex, instead of the
actual, well-formed DefDataRegion!

Thus, this is a bug in ACPI.SYS. (Remember: seeing this in Windows 8 and
Windows Server 2012 R2.)

"But", you ponder, "how do then the programmers that use Microsoft's ASL
compiler, ASL.exe, instead of Intel's iASL, employ DataTableRegion()? If
DataTableRegion() doesn't *run*, how and why can they *compile* it?"

They can't.

(Let me digress: why oh why must one download the ginormous Visual
Studio 2015 Community Edition, and then the WDK too, just to acquire a
comparatively tiny ASL compiler?

  
https://msdn.microsoft.com/en-us/library/windows/hardware/dn551195%28v=vs.85%29.aspx
  https://msdn.microsoft.com/en-us/windows/hardware/dn913721.aspx

Well, thankfully there's a way around it. Comments in the edk2 source
file

  
https://github.com/tianocore/edk2/blob/master/BaseTools/Conf/tools_def.template

contain this nice link:

  
http://download.microsoft.com/download/2/c/1/2c16c7e0-96c1-40f5-81fc-3e4bf7b65496/microsoft_asl_compiler-v4-0-0.msi

Exactly what I needed. Note that this is an ACPI 4.0 compatible
compiler, and DataTableRegion() is from ACPI 2.0!)

So obviously I wrapped the entire AML code from "docs/specs/vmgenid.txt"
into a minimal definition block, and tried to compile it with
Microsoft's ASL.exe. (I had done the same with iASL before I posted the
RFC for "vmgenid.txt" -- that's how I had ensured that the ASL would be
*valid* and suitable for "vmgenid.txt" in the first place!)

Consistently with the ACPI.SYS failure (to both execute and disassemble
DataRegionOp), ASL.exe fails to *compile* DataTableRegion()!

> C:\Program Files (x86)\Microsoft ASL Compiler v4.0>asl.exe x.dsl
> Microsoft ACPI Source Language Assembler Version 4.0.0NT [Aug 28 2009, 
> 18:36:36]
>
> Copyright (c) 1996,2009 Microsoft Corporation
> Compliant with the ACPI 4.0 Specification
>
> x.dsl:
>
>    49:                   Offset (54),
>                                    ^***
> x.dsl(49): error: field offset is exceeding operation region range 
> (offset=0x36)
>
>
> C:\Program Files (x86)\Microsoft ASL Compiler v4.0>

(If you search this email for "Offset (54)", you will see where the
above comes from.)

Let me explain: when you define an OperationRegion() in the ASL source,
ASL.exe marks the size of the region. Then in referring Field()
definitions, it checks the end of each individual field against the size
of the region (to ensure, at compile time, that accessing the field at
runtime won't extend beyond the region).

However, the size of the SystemMemory operation region that is defined
with the DataTableRegion() operator is only available at runtime -- the
operator looks up the matching table linked into the RSDT/XSDT, and then
that table *becomes* the operation region. So the size of the region is
the size of the table, which can be looked up at runtime *only*.

ASL.exe apparently considers the size of any such operation region zero.

Consequently, ASL.exe considers *any* field definition for such a region
to be out of bounds -- I tried to move the ADBP field to offset zero,
just for checking the compiler, and it blows up just the same; it
reports that offset 8 (the end of ADBP, when it starts at zero) is out
of range.

Without fields, an operation region is unusable. It doesn't matter that
ASL.exe doesn't actually choke on DataTableRegion(), if you then can't
define any fields for it!

I challange you (the generic you) to find *one* physical computer that
runs Windows *and* has DataTableRegion() in its ACPI tables!

TL;DR:
- Windows cannot execute DataTableRegion(), including modern Windows
  releases,
- ASL.exe cannot (usably) compile DataTableRegion(),
- this is not documented *anywhere* on the web that Google can find, but
  (given DataTableRegion()'s obviously huge utility) it must be silent,
  "secret" knowledge for physical BIOS developers (well, not any longer
  ;)),
- our great idea that centered on DataTableRegion() is busted. :(

Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]