qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [edk2] Could not add PCI device with big memory to aarc


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [edk2] Could not add PCI device with big memory to aarch64 VMs
Date: Wed, 2 Dec 2015 19:29:04 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 12/02/15 18:28, liang yan wrote:
> Hi, Laszlo,
> 
> On 11/30/2015 06:45 PM, Laszlo Ersek wrote:
>> On 12/01/15 01:46, liang yan wrote:
>>> Hello, Laszlo,
>>>
>>> On 11/30/2015 03:05 PM, Laszlo Ersek wrote:
>> [snip]
>>
>>>> If you need more room (with large alignments), then there's no way
>>>> around supporting QEMU's 64 bit aperture, VIRT_PCIE_MMIO_HIGH (see my
>>>> earlier email).
>>> I checked the function create_pcie form pathtoqemu/hw/arm/virt.c, it has
>>> a flag value use_highmem(which has default "true" value).
>>>
>>> It set base_mmio_high and size_mmio_high to device tree by function
>>> below,
>>>
>>>          qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "ranges",
>>>                                       1, FDT_PCI_RANGE_IOPORT, 2, 0,
>>>                                       2, base_pio, 2, size_pio,
>>>                                       1, FDT_PCI_RANGE_MMIO, 2,
>>> base_mmio,
>>>                                       2, base_mmio, 2, size_mmio,
>>>                                       1, FDT_PCI_RANGE_MMIO_64BIT,
>>>                                       2, base_mmio_high,
>>>                                       2, base_mmio_high, 2,
>>> size_mmio_high);
>>>
>>> So basically, I need to add two UINT64 variables like mmio_high_base and
>>> mmio_high_size to PCD under function ProcessPciHost(VirtFdtDxe.c),
>>> and try to use this high base address and size as new aperture.
>>>
>>> Is this correct?
>> It is correct, but that's only part of the story.
>>
>> Parsing the 64-bit aperture from the DTB into new PCDs in
>> ArmVirtPkg/VirtFdtDxe is the easy part.
>>
>> The hard part is modifying ArmVirtPkg/PciHostBridgeDxe, so that BAR
>> allocation requests (submitted by the platform independent PCI bus
>> driver that resides in "MdeModulePkg/Bus/Pci/PciBusDxe") are actually
>> serviced from this high aperture too.
>>
>>>> Unfortunately I can't readily help with that in the
>>>> "ArmVirtPkg/PciHostBridgeDxe" driver; there's no such (open-source)
>>>> example in the edk2 tree. Of course, I could experiment with it myself
>>>> -- only not right now.
>>> If possible, I do want to finish this part or help you finish it. I just
>>> work on UEFI recently, and thank you so much for your patient and detail
>>> explanation. I really appreciate it.
>>>> I guess copying and adapting the TypeMem32 logic to TypeMem64
>>>> (currently
>>>> short-circuited with EFI_ABORTED) could work.
>>> Is the 32 or 64 bit determined by BAR(2-3bit) or by the PCI device
>>> memory size? Is there an option from QEMU?
>> I can't tell. :)
>>
>>> Does TypeMem32 still keep  "VIRT_PCIE_MMIO" aperture and TypeMem64 use
>>> "VIRT_PCIE_MMIO_HIGH" aperture? or It's more like device property
>>> controlled from QEMU device simulation?
>> Good question. I don't know. I think in order to answer this question,
>> we should understand the whole dance between the PCI root bridge / host
>> bridge driver and the generic PCI bus driver.
>>
>> The documentation I know of is in the Platform Init spec, version 1.4,
>> Volume 5, Chapter 10 "PCI Host Bridge". I've taken multiple stabs at
>> that chapter earlier, but I've always given up.
>>
>> Sorry I can't help more, but this is new area for me as well.
> No, already a big help, Really appreciate your generous sharing.
> 
> I also have a problem from guest vm kernel side.
> 
> Even we change the UEFI side, should the guest kernel still need to modify?

No, if the guest UEFI side is fully fixed / enhanced, then the guest
kernel won't need any changes. (Shouldn't.)

> Because, I noticed that the kernel will do rescan.  use the example of
> 256M below.
> 
> UEFI side, has two setup, not sure which is the real one.

Both of these are "real".

The first snippet bears witness to the enumeration / discovery of the
single PCI device 00:01.0:

> PciBus: Discovered PCI @ [00|01|00]
>    BAR[0]: Type =  Mem32; Alignment = 0xFFF;    Length = 0x100; Offset =
> 0x10
>    BAR[1]: Type =  Mem32; Alignment = 0xFFF;    Length = 0x1000; Offset
> = 0x14
>    BAR[2]: Type = PMem64; Alignment = 0xFFFFFFF;    Length =
> 0x10000000;    Offset = 0x18
>                              ===== PMem64 here

Whereas the second snippet is the cumulative resource map for the root
bridge that the device is behind.

You get similar resource maps near the end of enumeration for PCI-to-PCI
bridges as well, if you have them. The bridge's resource map depends on
the devices behind it. In this case it could be confusing because you
have only one device (I guess).

> PciBus: Resource Map for Root Bridge PciRoot(0x0)
> Type =  Mem32; Base = 0x20000000;    Length = 0x10100000; Alignment =
> 0xFFFFFFF
>    Base = 0x20000000;    Length = 0x10000000;    Alignment =
> 0xFFFFFFF;    Owner = PCI [00|01|00:18]; Type = PMem32
> ============>Mem32 here
>    Base = 0x30000000;    Length = 0x1000;    Alignment = 0xFFF; Owner =
> PCI [00|01|00:14]
>    Base = 0x30001000;    Length = 0x100;    Alignment = 0xFFF; Owner =
> PCI [00|01|00:10]

So the above tells us that the PCI device would like to have a 64-bit
pmem BAR (or at least that it would be capable of working with one).

However, likely *exactly* because of the limitation that we have in the
resource allocation of the PCI host bridge / root bridge driver, that
256 MB BAR ends up being mapped under 4GB. This outcome is then visible
in the root bridge's resource map.

> but kernel side: it becomes 64bit pref
> 
> [    3.005355] pci_bus 0000:00: root bus resource [mem
> 0x10000000-0x3efeffff window]
> [    3.006028] pci_bus 0000:00: root bus resource [bus 00-0f]
> .
> .
> .
> [    3.135847] pci 0000:00:01.0: BAR 2: assigned [mem
> 0x10000000-0x1fffffff 64bit pref]
> [    3.137099] pci 0000:00:01.0: BAR 1: assigned [mem
> 0x20000000-0x20000fff]
> [    3.137382] pci 0000:00:01.0: BAR 0: assigned [mem
> 0x20001000-0x200010ff]

This difference is due to the facts that:

(a) the guest kernel is allowed to re-enumerate PCI devices, and
(b) the guest kernel has a full-fledged PCI resource allocator :)

> Also, I found that [mem 0x8000000000 window] was ignored,
> 
> [    2.769608] PCI ECAM: ECAM for domain 0000 [bus 00-0f] at [mem
> 0x3f000000-0x3fffffff] (base 0x3f000000)
> [    2.962930] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-0f])
> [    2.965787] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM
> ClockPM Segments MSI]
> [    2.990794] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME
> AER PCIeCapability]
> [    2.994520] acpi PNP0A08:00: host bridge window [mem 0x8000000000
> window] (ignored, not CPU addressable)

This is logged by the acpi_pci_root_validate_resources() kernel function
[drivers/acpi/pci_root.c].

The 0x8000000000 value comes from the ACPI payload that QEMU generates
and the firmware installs:

$ git show --color 5125f9cd -- hw/arm/virt-acpi-build.c

The acpi_pci_root_validate_resources() function complains because the
base address 0x8000000000 is greater than or equal to "iomem_resource.end".

"iomem_resource" looks like a global variable in the kernel. I don't
know how it is computed.

> so when do "cat /proc/iomem"
> no "8000000000-8000000000 : PCI Bus 0000:00" shows.
> Curious is that it works well and shows in another fedora kernel
> provided by Shanon.

Using the same firmware and the same VM configuration? Then guest kernel
changes might be necessary too, after all.

Thanks
Laszlo

> Do you have any idea that why this happen? Thanks.
> 
> 
> 
> Best,
> Liang
>> Laszlo
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]