qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host b


From: Marcel Apfelbaum
Subject: Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges
Date: Wed, 23 May 2018 19:50:53 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0



On 05/23/2018 05:25 PM, Michael S. Tsirkin wrote:
On Tue, May 22, 2018 at 10:28:56PM -0600, Alex Williamson wrote:
On Wed, 23 May 2018 02:38:52 +0300
"Michael S. Tsirkin" <address@hidden> wrote:

On Tue, May 22, 2018 at 03:47:41PM -0600, Alex Williamson wrote:
On Wed, 23 May 2018 00:44:22 +0300
"Michael S. Tsirkin" <address@hidden> wrote:
On Tue, May 22, 2018 at 03:36:59PM -0600, Alex Williamson wrote:
On Tue, 22 May 2018 23:58:30 +0300
"Michael S. Tsirkin" <address@hidden> wrote:
It's not hard to think of a use-case where >256 devices
are helpful, for example a nested virt scenario where
each device is passed on to a different nested guest.

But I think the main feature this is needed for is numa modeling.
Guests seem to assume a numa node per PCI root, ergo we need more PCI
roots.
But even if we have NUMA affinity per PCI host bridge, a PCI host
bridge does not necessarily imply a new PCIe domain.
What are you calling a PCIe domain?
Domain/segment

0000:00:00.0
^^^^ This
Right. So we can thinkably have PCIe root complexes share an ACPI segment.
I don't see what this buys us by itself.
The ability to define NUMA locality for a PCI sub-hierarchy while
maintaining compatibility with non-segment aware OSes (and firmware).
Fur sure, but NUMA is a kind of advanced topic, MCFG has been around for
longer than various NUMA tables. Are there really non-segment aware
guests that also know how to make use of NUMA?


Yes, the current pxb devices accomplish exactly that. Multiple NUMA nodes
while sharing PCI domain 0.

Thanks,
Marcel

Isn't that the only reason we'd need a new MCFG section and the reason
we're limited to 256 buses?  Thanks,

Alex
I don't know whether a single MCFG section can describe multiple roots.
I think it would be certainly unusual.
I'm not sure here if you're referring to the actual MCFG ACPI table or
the MMCONFIG range, aka the ECAM.  Neither of these describe PCI host
bridges.  The MCFG table can describe one or more ECAM ranges, which
provides the ECAM base address, the PCI segment associated with that
ECAM and the start and end bus numbers to know the offset and extent of
the ECAM range.  PCI host bridges would then theoretically be separate
ACPI objects with _SEG and _BBN methods to associate them to the
correct ECAM range by segment number and base bus number.  So it seems
that tooling exists that an ECAM/MMCONFIG range could be provided per
PCI host bridge, even if they exist within the same domain, but in
practice what I see on systems I have access to is a single MMCONFIG
range supporting all of the host bridges.  It also seems there are
numerous ways to describe the MMCONFIG range and I haven't actually
found an example that seems to use the MCFG table.  Two have MCFG
tables (that don't seem terribly complete) and the kernel claims to
find the MMCONFIG via e820, another doesn't even have an MCFG table and
the kernel claims to find MMCONFIG via an ACPI motherboard resource.
I'm not sure if I can enable PCI segments on anything to see how the
firmware changes.  Thanks,

Alex
Let me clarify.  So MCFG have base address allocation structures.
Each maps a segment and a range of bus numbers into memory.
This structure is what I meant.

IIUC you are saying on your systems everything is within a single
segment, right? Multiple pci hosts map into a single segment?

If you do this you can do NUMA, but do not gain > 256 devices.

Are we are the same page then?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]