[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] x86: gigabyte alignment for ram
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH v2] x86: gigabyte alignment for ram |
Date: |
Tue, 17 Dec 2013 13:59:36 +0200 |
On Tue, Dec 17, 2013 at 11:54:46AM +0100, Gerd Hoffmann wrote:
> Hi,
>
> > > Problem is that the firmware places the xbar @ 0xb000000.
> > > Hardcoded, assuming qemu will not map ram above 0xb0000000.
> >
> > Can't bios figure out the size of memory below 4G from fwcfg?
> > I refer to 7db16f2480db5e246d34d0c453cff4f58549df0e specifically.
>
> It can, but it doesn't.
>
> Additional issue for coreboot is that mmconfig base is a compile-time
> constant, because it is setup _very_ early in the boot process.
> Coreboot then does the whole pci initialization using mmconfig.
>
> On the other hand coreboot has a much more sophisticated ressource
> management than seabios, so moving the mmconf xbar up to to
> 0xe0000000-0xefffffff, then managing two regions (below 0xe0000000 and
> above 0xf0000000) for pci bars probably isn't a big issue for coreboot.
>
> > > So, we must (a) fix firmware first and (b) get a ugly dependency
> > > that older firmware will not run on latest qemu.
> >
> > That's only important for old machine types though, right?
>
> Correct. That makes it a bit less problematic, but it is still not very
> nice.
>
> > > We also need to figure how we wanna fixup things. So, current memory
> > > layout looks like this:
> > >
> > > 0x00000000 - 0xafffffff -- RAM / unused
> > > 0xb0000000 - 0xbfffffff -- mmconfig xbar [256 pci busses]
> > > 0xc0000000 - 0xfec00000 -- space for pci bars, almost 1g
> > >
> > > Largest pci bar we can map below 4g is 512m, @ 0xc0000000.
> > >
> > > If we wanna map 3G RAM we need to move the xbar somewhere else. Big
> > > question is where?
> > >
> > > We can move it to 0xc0000000. Then we can't map 512m pci bars any more.
> >
> > I would go for this when we have 3G of RAM.
> > I think that ability to support a single 512m BAR is not all that important.
>
> Use case: pci passthrough of graphics cards.
>
> > > We can move it to 0xe0000000. Then we have to split the pci bar space,
> > > mapping large bars below 0xe0000000 and small ones above 0xf0000000.
> > > SeaBIOS pci init code isn't really up to it.
> > > Could also become tricky
> > > to declare it correctly in acpi / e820 due to the split.
> >
> > My laptop's ACPI has this space all fragmented up, seems to boot fine ...
>
> We need to change the way we reserve the mmconfig space though.
>
> Currently it is marked reserved in the e820 table. Having that overlap
> with the _CRS region makes windows quite unhappy, we tried that
> recently.
Yes this also contradicts the spec, see below.
> My laptop has the mmconfig space declared as LPC ressource:
>
> Device (LPC)
> {
> Name (_ADR, 0x001F0000) // _ADR: Address
> Name (_S3D, 0x03) // _S3D: S3 Device State
> Name (RID, 0x00)
> Device (SIO)
> {
> Name (_HID, EisaId ("PNP0C02"))
> Name (_UID, 0x00) // _UID: Unique ID
> Name (SCRS, ResourceTemplate ()
> [ ... ]
> Memory32Fixed (ReadWrite,
> 0xF8000000, // Address Base
> 0x04000000, // Address Length
> )
> [ ... ]
> Method (_CRS, 0, NotSerialized)
> [ ... return SCRS, with updates applied in some cases ... ]
>
> When doing it this way we can simply make the PCI0._CRS cover the whole
> end-of-ram -> ioapic-base range, simliar to piix, and we are pretty free
> to place the mmconfig xbar anywhere in that area.
The spec says:
2.If the operating system does not natively comprehend reserving the
MMCFG region, the
MMCFG region must be reserved by firmware. The address range reported in
the MCFG table
or by _CBA method (see Section 4.1.3) must be reserved by declaring a
motherboard resource.
For most systems, the motherboard resource would appear at the root of
the ACPI namespace
(under \_SB) in a node with a _HID of EISAID (PNP0C02), and the
resources in this case
should not be claimed in the root PCI bus’s _CRS. The resources can
optionally be returned in
Int15 E820 or EFIGetMemoryMap as reserved memory but must always be
reported through
ACPI as a motherboard resource.
My reading of the above is that this can be an LPC resource but
claiming this as the root's _CRS isn't ok then.
>
> Doing the transition is non-trivial though because we (a) move the job
> of reserving the mmconfig area from firmware to qemu and (b) the testing
> needed for that.
>
> Maybe we should set the gbyte alignment on q35 aside for a while and
> tackle the mmconfig reservation handling first.
>
> cheers,
> Gerd
I merged your patch but split it: q35 is separate and piix
is separate. Would you like me to drop the q35 part then?