qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the de


From: Gleb Natapov
Subject: Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
Date: Sun, 21 Nov 2010 18:01:11 +0200

On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote:
> On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote:
> > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote:
> > > > > The guests.
> > > > Which one? There are many guests. Your favorite?
> > > > 
> > > > > For CLI, we need an easy way to map a device in guest to the
> > > > > device in qemu and back.
> > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since 
> > > > what
> > > > you are saying is "lets use name that guest assigned to a device". 
> > > 
> > > No I am saying let's use the name that our ACPI tables assigned.
> > > 
> > ACPI does not assign any name. In a best case ACPI tables describe resources
> > used by a device.
> 
> Not only that. bus number and segment aren't resources as such.
> They describe addressing.
> 
> > And not all guests qemu supports has support for ACPI. Qemu
> > even support machines types that do not support ACPI.
> 
> So? Different machines -> different names.
> 
You want to have different cli for different type of machines qemu
supports?

> > > > > 
> > > > > 
> > > > > > It looks like you identify yourself with most of
> > > > > > qemu users, but if most qemu users are like you then qemu has not 
> > > > > > enough
> > > > > > users :) Most users that consider themselves to be "advanced" may 
> > > > > > know
> > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > > > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > > > > 
> > > > > > More important is that "domain" (encoded as number like you used to)
> > > > > > and "bus number" has no meaning from inside qemu.
> > > > > > So while I said many
> > > > > > times I don't care about exact CLI syntax to much it should make 
> > > > > > sense
> > > > > > at least. It can use id to specify PCI bus in CLI like this:
> > > > > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > > > > path. But doing ah-hoc device enumeration inside qemu and then 
> > > > > > using it
> > > > > > for CLI is not it.
> > > > > > 
> > > > > > > functionality in the guests.  Qemu is buggy in the moment in that 
> > > > > > > is
> > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > > > > but that can be fixed.
> > > > > > It looks like you confused ACPI _SEG for something it isn't.
> > > > > 
> > > > > Maybe I did. This is what linux does:
> > > > > 
> > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > > > > *root)
> > > > > {
> > > > >         struct acpi_device *device = root->device;
> > > > >         int domain = root->segment;
> > > > >         int busnum = root->secondary.start;
> > > > > 
> > > > > And I think this is consistent with the spec.
> > > > > 
> > > > It means that one domain may include several host bridges.
> > > > At that level
> > > > domain is defined as something that have unique name for each device
> > > > inside it thus no two buses in one segment/domain can have same bus
> > > > number. This is what PCI spec tells you. 
> > > 
> > > And that really is enough for CLI because all we need is locate the
> > > specific slot in a unique way.
> > > 
> > At qemu level we do not have bus numbers. They are assigned by a guest.
> > So inside a guest domain:bus:slot.func points you to a device, but in
> > qemu does not enumerate buses.
> > 
> > > > And this further shows that using "domain" as defined by guest is very
> > > > bad idea. 
> > > 
> > > As defined by ACPI, really.
> > > 
> > ACPI is a part of a guest software that may not event present in the
> > guest. How is it relevant?
> 
> It's relevant because this is what guests use. To access the root
> device with cf8/cfc you need to know the bus number assigned to it
> by firmware. How that was assigned is of interest to BIOS/ACPI but not
> really interesting to the user or, I suspect, guest OS.
> 
Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux
have cmd just for that. And saying that ACPI is relevant because this is
what guest software use in a reply to sentence that states that not all
guest even use ACPI is, well, strange.

And ACPI describes only HW that present at boot time. What if you
hot-plugged root pci bridge? How non existent PCI naming helps you?

> > > > > > ACPI spec
> > > > > > says that PCI segment group is purely software concept managed by 
> > > > > > system
> > > > > > firmware. In fact one segment may include multiple PCI host bridges.
> > > > > 
> > > > > It can't I think:
> > > > Read _BBN definition:
> > > >  The _BBN object is located under a PCI host bridge and must be unique 
> > > > for
> > > >  every host bridge within a segment since it is the PCI bus number.
> > > > 
> > > > Clearly above speaks about multiple host bridge within a segment.
> > > 
> > > Yes, it looks like the firmware spec allows that.
> > It even have explicit example that shows it.
> > 
> > > 
> > > > >       Multiple Host Bridges
> > > > > 
> > > > >       A platform may have multiple PCI Express or PCI-X host bridges. 
> > > > > The base
> > > > >       address for the
> > > > >       MMCONFIG space for these host bridges may need to be allocated 
> > > > > at
> > > > >       different locations. In such
> > > > >       cases, using MCFG table and _CBA method as defined in this 
> > > > > section means
> > > > >       that each of these host
> > > > >       bridges must be in its own PCI Segment Group.
> > > > > 
> > > > This is not from ACPI spec,
> > > 
> > > PCI Firmware Specification 3.0
> > > 
> > > > but without going to deep into it above
> > > > paragraph talks about some particular case when each host bridge must
> > > > be in its own PCI Segment Group with is a definite prove that in other
> > > > cases multiple host bridges can be in on segment group.
> > > 
> > > I stand corrected. I think you are right. But note that if they are,
> > > they must have distinct bus numbers assigned by ACPI.
> > ACPI does not assign any numbers.
> 
> For all root pci devices firmware must supply BBN number. This is the
> bus number, isn't it? For nested buses, this is optional.
Nonsense. _BBN is optional and does not present in Seabios DSDT. As far
as I can tell it is only needed if PCI segment group has more then one
pci host bridges.
 
> 
> > Bios enumerates buses and assign
> > numbers.
> 
> There's no standard way to enumerate pci root devices in guest AFAIK.
> The spec says:
>       Firmware must configure all Host Bridges in the systems, even if
>       they are not connected to a console or boot device. Firmware must
>       configure Host Bridges in order to allow operating systems to use the
>       devices below the Host Bridges. This is because the Host Bridges
>       programming model is not defined by the PCI Specifications. 
> 
> 
Guest should be aware of HW to use it. Be it through bios or driver.

> > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits
> > one layer below all this and does not enumerate PC buses. Even if we make
> > it to do so there is not way to guaranty that guest will enumerate them
> > in the same order since there is more then one way to do enumeration. I
> > repeated this numerous times to you already.
> 
> ACPI is really part of the motherboard. Calling it the guest just
> confuses things. Guest OS can override bus numbering for nested buses
> but not for root buses.
> 
If calling ACPI part of a guest confuses you then you are already
confused. Guest OS can do whatever it wishes with any enumeration FW did
if it knows better.

> > > 
> > > > > 
> > > > > > _SEG
> > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used 
> > > > > > _CRS
> > > > > > (Current Resource Settings) for that just like OF. No surprise 
> > > > > > there.
> > > > > 
> > > > > OSPM uses both I think.
> > > > > 
> > > > > All I see linux do with CRS is get the bus number range.
> > > > So lets assume that HW has two PCI host bridges and ACPI has:
> > > >         Device(PCI0) {
> > > >             Name (_HID, EisaId ("PNP0A03"))
> > > >             Name (_SEG, 0x00)
> > > >         }
> > > >         Device(PCI1) {
> > > >             Name (_HID, EisaId ("PNP0A03"))
> > > >             Name (_SEG, 0x01)
> > > >         }
> > > > I.e no _CRS to describe resources. How do you think OSPM knows which of
> > > > two pci host bridges is PCI0 and which one is PCI1?
> > > 
> > > You must be able to uniquely address any bridge using the combination of 
> > > _SEG
> > > and _BBN.
> > 
> > No at all. And saying "you must be able" without actually show how
> > doesn't prove anything. _SEG is relevant only for those host bridges
> > that support MMCONFIG (not all of them do, and none that qemu support
> > does yet). _SEG points to specific entry in MCFG table and MCFG entry
> > holds base address for MMCONFIG space for the bridge (this address
> > is configured by a guest). This is all _SEG does really, no magic at
> > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing
> > qemu visible again.
> > So _SEG and _BBN gives you two numbers assigned by
> > a guest FW. Nothing qemu can use to identify a device.
> 
> This FW is given to guest by qemu. It only assigns bus numbers
> because qemu told it to do so.
Seabios is just a guest qemu ships. There are other FW for qemu. Bochs
bios, openfirmware, efi. All of them where developed outside of qemu
project and all of them are usable without qemu. You can't consider them
be part of qemu any more then Linux/Windows with virtio drivers.

> 
> > > 
> > > > > And the spec says, e.g.:
> > > > > 
> > > > >         the memory mapped configuration base
> > > > >       address (always corresponds to bus number 0) for the PCI 
> > > > > Segment Group
> > > > >       of the host bridge is provided by _CBA and the bus range 
> > > > > covered by the
> > > > >       base address is indicated by the corresponding bus range 
> > > > > specified in
> > > > >       _CRS.
> > > > > 
> > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. 
> > > > Lets
> > > > solve the problem for PCI first and then move to PCI Express. Jumping 
> > > > from one
> > > > to another destruct us from main discussion.
> > > 
> > > I think this is what confuses us.  As long as you are using cf8/cfc 
> > > there's no
> > > concept of a domain really.
> > > Thus:
> > >   /address@hidden
> > > 
> > > is probably enough for BIOS boot because we'll need to make root bus 
> > > numbers
> > > unique for legacy guests/option ROMs.  But this is not a hardware 
> > > requirement
> > > and might become easier to ignore eith EFI.
> > > 
> > You do not need MMCONFIG to have multiple PCI domains. You can have one
> > configured via standard cf8/cfc and another one on ef8/efc and one more
> > at mmio fce00000 and you can address all of them:
> > /address@hidden
> > /address@hidden
> > /address@hidden
> > 
> > And each one of those PCI domains can have 256 subbridges.
> 
> Will common guests such as windows or linux be able to use them? This
With proper drivers yes. There is HW with more then one PCI bus and I
think qemu emulates some of it (PPC MAC for instance). 

> seems to be outside the scope of the PCI Firmware specification, which
> says that bus numbers must be unique.
They must be unique per PCI segment group.

> 
> > > > > 
> > > > > > > 
> > > > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > > > describe the topology when we interface with firmware, e.g. to 
> > > > > > > describe
> > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches 
> > > > > > > deal
> > > > > > > with), but that's probably the only case.
> > > > > > > 
> > > > > > Describing HW topology is the only way to unambiguously describe 
> > > > > > device to
> > > > > > something or someone outside qemu and have persistent device naming
> > > > > > between different HW configuration.
> > > > > 
> > > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > > 
> > > > APCI is part of the guest, not qemu.
> > > 
> > > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > > it's supplied by the motherboard.
> > > 
> > It is not generated by qemu. Parts of it depend on HW and other part depend
> > on how BIOS configure HW. _BBN for instance is clearly defined to return
> > address assigned bu the BIOS.
> 
> BIOS is supplied on the motherboard and in our case by qemu as well.
You can replace MB bios by coreboot+seabios on some of them.
Manufacturer don't want you to do it and make it hard to do, but
otherwise this is just software, not some magic dust.

> There's no standard way for BIOS to assign bus number to the pci root,
> so it does it in device-specific way. Why should a management tool
> or a CLI user care about these? As far as they are concerned
> we could use some PV scheme to find root devices and assign bus
> numbers, and it would be exactly the same.
> 
Go write KVM userspace that does that. AFAIK there is project out there
that tries to do that. No luck so far. Your world view is very x86/Linux
centric. You need to broaden it a little bit. Next time you propose
something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd.


> > > > Just saying "not really" doesn't
> > > > prove much. I still haven't seen any proposition from you that actually
> > > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > > such thing as "The Guest". 
> > > > 
> > > > --
> > > >                         Gleb.
> > > 
> > > I am sorry if I didn't make this clear.  I think we should use the 
> > > domain:bus
> > > pair to name the root device. As these are unique and 
> > > 
> > You forgot to complete the sentence :) But you made it clear enough and
> > it is incorrect. domain:bus pair not only not unique they do not exist
> > in qemu at all
> 
> Sure they do. domain maps to mcfg address for express. bus is used for
mcfg is optional as far as I can see. You can compile out MMCONFIG
support on Linux.

> cf8/cfc addressing. They are assigned by BIOS but since BIOS
> is supplied with hardware the point is moot.
Most PC hardware is supplied with Windows, so what? BIOS is a code that
runs in a guest. It is part of a guest. Every line of code executed by
vcpu belongs to a guest. No need to redefine things to prove you point.

> 
> > and as such can't be used to address device. They are
> > product of HW enumeration done by a guest OS just like eth0 or C:.
> > 
> > --
> >                     Gleb.
> 
> There's a huge difference between BIOS and guest OS,
Not true.

>                                            and between bus
> numbers of pci root and of nested bridges.
Really? What is it?

> 
> Describing hardware io ports makes sense if you are trying to
> communicate data from qemu to the BIOS.  But the rest of the world might
> not care.
> 
The part of the world that manage HW cares. You may need to add device
from monitor before first line of BIOS is event executed. How can you
rely on BIOS enumerate of devices in this case?


--
                        Gleb.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]