qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [edk2] [edk2 PATCH 0/1] OvmfPkg: grab ACPI tables from


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [edk2] [edk2 PATCH 0/1] OvmfPkg: grab ACPI tables from QEMU
Date: Wed, 20 Nov 2013 00:03:06 +0200

On Tue, Nov 19, 2013 at 10:39:06PM +0100, Laszlo Ersek wrote:
> On 11/19/13 22:13, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 01:31:39PM +0100, Laszlo Ersek wrote:
> >> On 11/19/13 09:54, Gerd Hoffmann wrote:
> >>>   Hi,
> >>>
> >>>>   ACPI PciWindow32: Base=0xA0000000 End=0xFEEFFFFF Length=0x5EF00000
> >>>
> >>>>   begin32=c0000000 end32=fec00000 begin64=0 end64=0
> >>>
> >>> qemu & seabios pick a 32bit window size which allows to cover it with
> >>> a single mtrr entry.  Size must be a power of two for that to work.
> >>>
> >>>   address@hidden ~]# cat /proc/mtrr
> >>>   reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
> >>>
> >>> So we have three cases for piix (as you've figured in the source code
> >>> already).  Start at 0x80000000 (2G window), 0xc0000000 (1G window) and
> >>> 0xe0000000 (512M window).
> >>>
> >>> btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
> >>> remaining address space (0xb0000000 -> 0xc0000000) is used for
> >>> mmconfig.
> >>>
> >>>> I guess the range starting at 0xc0000000 is somehow "incompatible"
> >>>> with the EFI memory map. (I can't actually explain this idea because,
> >>>> again, this second range is a proper subset of the former, and its
> >>>> size is still 1004MB.)
> >>>
> >>> Probably efi places the gfx memory bar somewhere between 0xa0000000
> >>> and 0xc0000000.  Sets up efifb accordingly.  Then the linux kernel
> >>> loads and figures "Oh, that bar is outside the 0xc0000000+ window" and
> >>> goes remap it -> efifb writes go into nowhere.
> >>
> >> Thank you for the explanation.
> >>
> >> How do I fix it though? :) I could change the MMIO HOBs in PEI
> >> (OvmfPkg/PlatformPei/Platform.c):
> >>
> >>   //
> >>   // Video memory + Legacy BIOS region
> >>   //
> >>   AddIoMemoryRangeHob (0x0A0000, BASE_1MB);
> >>
> >>   //
> >>   // address       purpose   size
> >>   // ------------  --------  -------------------------
> >>   // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
> >>   // 0xFC000000    gap                           44 MB
> >>   // 0xFEC00000    IO-APIC                        4 KB
> >>   // 0xFEC01000    gap                         1020 KB
> >>   // 0xFED00000    HPET                           1 KB
> >>   // 0xFED00400    gap                         1023 KB
> >>   // 0xFEE00000    LAPIC                          1 MB
> >>   //
> >>   AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 
> >> 0xFC000000);
> >>   AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
> >>   AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
> >>   AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
> >>
> >> to imitate the same three cases. The HOB with the lowest address
> >> produced here would affect the BAR placement as well.
> >>
> >> ... Yes, I tested the attached patch, and it makes the display work in
> >> both 2560MB guests.
> >>
> >> However, I don't like the idea of hardwiring those boundaries here. (The
> >> current values are also hardwired, but they are better: they are not the
> >> consequence of "SeaBIOS has done it like this forever" -- inter-firmware
> >> dependency, especially when they aren't each other's payloads, is bad
> >> IMHO.) We'd need something dynamic here, like a memory map from qemu.
> >>
> >> ... Which puts us in the same boat with Wei :)
> >>
> >> Thanks
> >> Laszlo
> > 
> > So the bug is really qemu being too conservative with it's ACPI
> > tables, isn't it?
> 
> Here's what I think happens (and I'll shamelessly steal Gerd's explanation):
> 
> (1) OVMF configures its own memory map very early (in PEI). The ranges
> configured are quite static. The only variable that goes into this
> configuration is "top of RAM below 4G", which is capped at 3.5GB by qemu
> (the variable comes from the CMOS):
> 
>   TopOfMemory <=  2G ---> PCI hole starts at 2G
>   TopOfMemory >   2G ---> PCI hole starts at TopOfMemory
>   TopOfMemory > 3.5G ---> not possible
> 
> The low bound of the 32-bit PCI hole determined thus further affects two
> things (in a later phase (DXE)) of OVMF:
> 
> (2) PCI resource allocation -- in practice the video framebuffer gets
> allocated right at the start of the hole,
> (3) the value that OVMF's own ACPI table population code advertises in
> the PCI bridge's _CRS.
> 
> 
>               > (2) PCI resource alloc
>              /
>   (1) memmap
>             \
>              > (3) ACPI
>                        \
>                         > (4) guest OS access -- match (2)
> 
> So, until this patch, everything was in sync. With this patch, the low
> bound exported over ACPI -- in (3) -- got replaced (whole-sale) by
> qemu's ACPI table. Therefore (1) only determined (2), and a conflict
> ensued between (2) and (3). (Which is where I refer back to Gerd's
> explanation -- the guest OS tries to access the framebuffer according to
> (3), the value in the ACPI table, but that's not where the framebuffer
> has actually been configured in (2), which had keyed off (1).)
> 
>               > (2) PCI resource alloc
>              /
>   (1) memmap
> 
>    (X) qemu
>            \
>             > (3) ACPI
>                       \
>                        > (4) guest OS access -- should match (2),
>                                                 but doesn't
> 
> 
> The *real* bug is that OVMF's memory map configuration (1) is not
> *dynamically* synchronized with qemu's ACPI table population logic (X).
> My patch that I attached earlier to this thread fixes this (by simply
> duplicating the same "quantization" logic), but it's ugly:
> 
>               > (2) PCI resource alloc
>              /
>   (1) memmap
>         ||
>    (X) qemu
>            \
>             > (3) ACPI
>                       \
>                        > (4) guest OS access -- matches (2) by way of
>                                                 code duplication
>                                                 between (1) and (X)
> 
> What we really should implement is this:
> 
>                           > (2) PCI resource alloc
>                          /
>             > (1) memmap
>            /
>    (X) qemu
>            \
>             > (3) ACPI
>                        \
>                         > (4) guest OS access -- matches (2) by (X)
>                                                  being the common root
>                                                  of everything
> 
> That is, OVMF should compose its own memmap in (1) based on dynamic
> information from qemu (importantly, the start address of the 32-bit PCI
> hole that qemu is going to advertise later in the ACPI tables).
> 
> > The MTRRs in BIOS don't affect much: AFAIK they are completely ignored
> > if it's exiting to the host, but I'm not sure about assigned
> > devices.
> > Alex, can you remind us all what happens there?
> 
> If you simply relax (drop) the stage-wise hole-setting, that will at
> best create a situation where OVMF and qemu duplicate the logic. This
> info needs to be passed down to OVMF (maybe it already is, somewhere in
> fw_cfg!), in some form that's more approachable than an ACPI table. (We
> absolutely don't want to parse ACPI (AML) in OVMF.)
> 
> Thanks!
> Laszlo


So the argument of whether we should make qemu control
PCI BAR mapping for guest is a completely separate issue
I think.
Even if we should, I think making ACPI depend on guest
obeying this logic is a mistake.

I think the whole confusion comes from the pci hole terminology.
There's no PCI hole, at least not on PIIX.
On PIIX what's not memory apic etc, is PCI.
So guest can use everything that isn't memory for PCI.

The only issue is _CRS in our ACPI is kind of seabios specific.
So let's just make it generic, make it really describe hardware:


Two possible fixes, both of them in QEMU:
1. declare all available regions (that resolve to PCI) in _CRS
2. detect where did guest put PCI devices and declare the result in _CRS

2 is exactly what we did for 64 bit BARs.

The only issue I worry about is MTRRs.
Maybe we are lucky and they do not matter for KVM.

-- 
MST



reply via email to

[Prev in Thread] Current Thread [Next in Thread]