qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of pc


From: Knut Omang
Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of pcie devices
Date: Fri, 03 Oct 2014 13:22:48 +0200

On Wed, 2014-10-01 at 17:08 +0300, Marcel Apfelbaum wrote:
> On Wed, 2014-10-01 at 07:26 +0200, Knut Omang wrote:
> > On Tue, 2014-09-30 at 21:38 +0800, Gonglei wrote:
> > > > Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari 
> > > > capability of
> > > > pcie devices
> > > > 
> > > > On Tue, Sep 30, 2014 at 06:11:25PM +0800, address@hidden wrote:
> > > > > From: Gonglei <address@hidden>
> > > > >
> > > > > In QEMU, ARI Forwarding is enabled default at emulation of PCIe
> > > > > ports. ARI Forwarding enable setting at firmware/OS Control handoff.
> > > > > If the bit is Set when a non-ARI Device is present, the non-ARI
> > > > > Device can respond to Configuration Space accesses under what it
> > > > > interprets as being different Device Numbers, and its Functions can
> > > > > be aliased under multiple Device Numbers, generally leading to
> > > > > undesired behavior.
> > > > 
> > > > So what is the undesired behaviour?
> > > > Did you observe such?
> > > 
> > > I just observe the PCI device don't work, and the dmesg show me:
> > > 
> > > [ 159.035250] pciehp 0000:05:00.0:pcie24: Button pressed on Slot (0 - 4)
> > > [ 159.035274] pciehp 0000:05:00.0:pcie24: Card present on Slot (0 - 4)
> > > [ 159.036517] pciehp 0000:05:00.0:pcie24: PCI slot #0 - 4 - powering on 
> > > due to button press.
> > > [ 159.188049] pciehp 0000:05:00.0:pcie24: Failed to check link status
> > > [ 159.201968] pciehp 0000:05:00.0:pcie24: Card not present on Slot (0 - 4)
> > > [ 159.202529] pciehp 0000:05:00.0:pcie24: Already disabled on Slot (0 - 4)
> > 
> > This seems very much like the symptoms I see when I use hotplug after
> > the latest changes to the hotplug code - do you have something that
> > confirms this has something to do with wrong interpretation of device
> > IDs? My suspicion is that something has gone wrong or is missing in the
> > hotplug logic but I havent had time to dig deeper into it yet.
> Can you please describe me the steps to reproduce the issue?

Hmm, while trying to reproduce again I realize my hotplug issues are not
the same as the one Gonglei reports, let me come back to that in a
separate mail later,

My main point here is that I don't see how this particular fix would
alleviate Gonglei's issue, as it does not seem to get triggered unless
there's a bug in the emulated port/switch?

Gonglei, I assume you are use the TI x3130 in your example:
IMHO any PCIe x 1 device should fit into any of the (3) PCIe slots
provided by the TI x3130, and according to the spec:

http://www.ti.com/lit/gpn/xio3130

these slots appear as separate buses, which means that all devices will
have devfn 0.0 but on different buses, eg. if you have all three switch
slots (not to be confused with the slot number given by the PCI_SLOT()
macro) populated and no other secondary buses before it, you should see
the downstream switch ports as 01:xx:x and devices in 02:00.0, 03:00.0
and 04:00.0 for slots 0, 1 and 2. This seems to correspond well with the
current Qemu model.

So unless your device itself exposes function# > 8 and is not ARI
capable (which would be a non-compliant device as far as I read the
standard) you should never be able to see any device in any of the
downstream ports have PCI_SLOT(devfn) != 0 ?

With qemu master + my SR/IOV patch set + the igb patch (just to have an
ARI capable device to play with) here:

https://github.com/knuto/qemu, branch sriov_patches_v3

and a guest running F20 I am able to boot with a device (such as for
instance an e1000) inserted in a root port, I am also able to hot plug
it from the qemu monitor, eg.

qemu-kvm  ... \
  -device ioh3420,slot=0,id=pcie_port.0

...

(qemu) device_add e1000,vlan=1,bus=pcie_port.0,id=ne
(qemu) device_del ne

In both cases the ARIfwd bit is disabled by default and not enabled by
the port driver (just as I would expect) so at least your comment 
is wrong as far as I can see.

Booting with a two-port downstream switch with no devices plugged, I see
the same, ARIfwd is not enabled on the downstream ports as long as no
device is in any slots, which is as expected, isn't it?

qemu-kvm  ... \
  -device x3130-upstream,id=upsw \
  -device xio3130-downstream,bus=upsw,addr=0.0,chassis=5,id=ds_port.0 \
  -device xio3130-downstream,bus=upsw,addr=1.0,chassis=6,id=ds_port.1
...

The upstream port does not support ARIfwd in the QEMU model, which I
suppose is correct as it will only contain individual downstream
devices, which expose new buses but never have more than one function.

However, either booting or hotplugging an e1000 into the downstream
switch, eg.

(qemu) device_add e1000,vlan=1,bus=ds_port.0,id=ne

with my current guest image kernel 3.13.10-200.fc20.x86_64+debug I get a
NULL pointer Oops in the guest,

[    0.520009] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000038
[    0.521000] IP: [<ffffffff813d942d>] pcie_aspm_init_link_state+0x6ed/0x7c0
[    0.521000] PGD 0 
[    0.521000] Oops: 0000 [#1] SMP 

The exact same happens if I use the ARI capable igb device instead.

I will upgrade the guest to a newer image (what kernel are you running with, 
Gonglei?) 

lspci reports this on the guest (before I try to add any device in the 
downstream port(s)):

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Cirrus Logic GD 5446
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
00:03.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem
00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express 
Root Port 0 (rev 02)
00:05.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) 
(rev 02)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller 
(rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port 
SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) 
(rev 01)
02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) 
(rev 01)

# lspci -t
-[0000:00]-+-00.0
           +-01.0
           +-02.0
           +-03.0
           +-04.0-[01]--
           +-05.0-[02-04]--+-00.0-[03]--
           |               \-01.0-[04]--
           +-1f.0
           +-1f.2
           \-1f.3

eg. the devices plugged into the downstream ports will end up at 03:00.0 and 
04:00.0,
the info qtree command verifies that the e1000 insert ends up in 03:00.0 as 
expected.

Similar if I instead use 

(qemu) device_add e1000,vlan=1,bus=ds_port.1,id=ne2

I see this with info qtree:

  class Ethernet controller, addr 04:00.0, pci id 8086:100e (sub 1af4:1100) 

Anyway I suppose this indicates that something is not right/is missing
with the downstream switch emulation, and not with the (generic) way the
ARIfwd cap/ctrl bit works

> It should work correctly by now, maybe a "surprise removal/addition"
> warning and nothing more.
> 
> Thank you,
> Marcel
> 
> > 
> > I am just concerned that this might alleviate the symptoms you see but
> > not fix the problem itself,
> > 
> > Knut
> > 
> > > > Please make this explicit in the commit log.
> > > > 
> > > Sorry, I copied the description from PCIe spec. :(
> > > 
> > > IMPLEMENTATION NOTE at Page 19:
> > > 
> > > https://www.pcisig.com/specifications/pciexpress/specifications/ECN-alt-rid-interpretation-070604.pdf
> > > 
> > > 
> > > > 
> > > > >
> > > > > So, for PCI devices attached in PCIe root ports or downstream pots,
> > > > > we should assure that its slot is not non-zero. 



> > > > > For PCIe devices, which
> > > > > ARP capability is not enabled, we also should assure that its slot
> > > > > is not non-zero.
> > > > 
> > > > not non zero => zero
> > > > 
> > > OK.
> > > 
> > > > >
> > > > > Signed-off-by: Gonglei <address@hidden>
> > > > > ---
> > > > >  hw/pci/pci.c          |  4 ++++
> > > > >  hw/pci/pcie.c         | 51
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/pci/pcie.h |  1 +
> > > > >  3 files changed, 56 insertions(+)
> > > > >
> > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > index 6ce75aa..22b7ca0 100644
> > > > > --- a/hw/pci/pci.c
> > > > > +++ b/hw/pci/pci.c
> > > > > @@ -1770,6 +1770,10 @@ static int pci_qdev_init(DeviceState *qdev)
> > > > >          }
> > > > >      }
> > > > >
> > > > > +    if (pcie_cap_slot_check(bus, pci_dev)) {
> > > > > +        return -1;
> > > > > +    }
> > > > > +
> > > > >      /* rom loading */
> > > > >      is_default_rom = false;
> > > > >      if (pci_dev->romfile == NULL && pc->romfile != NULL) {
> > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > index 1babddf..b82211a 100644
> > > > > --- a/hw/pci/pcie.c
> > > > > +++ b/hw/pci/pcie.c
> > > > > @@ -633,3 +633,54 @@ void pcie_ari_init(PCIDevice *dev, uint16_t 
> > > > > offset,
> > > > uint16_t nextfn)
> > > > >                          offset, PCI_ARI_SIZEOF);
> > > > >      pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & 0xff) 
> > > > > << 8);
> > > > >  }
> > > > > +
> > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev)
> > > > > +{
> > > > > +    Object *obj = OBJECT(bus);
> > > > > +
> > > > > +    if (pci_bus_is_root(bus)) {
> > > > > +        return 0;
> > > > > +    }
> > > > > +
> > > > > +    if (object_dynamic_cast(obj, TYPE_PCIE_BUS)) {
> > > > > +        DeviceState *parent = qbus_get_parent(BUS(obj));
> > > > > +        PCIDevice *pci_dev = PCI_DEVICE(parent);
> > > > > +        uint8_t port_type;
> > > > > +        /*
> > > > > +         * Root ports and downstream ports of switches are the hot
> > > > > +         * pluggable ports in a PCI Express hierarchy.
> > > > > +         * PCI Express supports chip-to-chip interconnect, a PCIe 
> > > > > link can
> > > > > +         * only connect one PCI device/Switch/EndPoint or PCI-bridge.
> > > > > +         *
> > > > > +         * 7.3. Configuration Transaction Rules (PCI Express 
> > > > > specification
> > > > 3.0)
> > > > > +         * 7.3.1. Device Number
> > > > > +         *
> > > > > +         * Downstream Ports that do not have ARI Forwarding enabled
> > > > must
> > > > > +         * associate only Device 0 with the device attached to the 
> > > > > Logical
> > > > Bus
> > > > > +         * representing the Link from the Port.
> > > > > +         *
> > > > > +         * In QEMU, ARI Forwarding is enabled default at emulation of
> > > > PCIe
> > > > 
> > > > s/enabled default/enabled by default/

It is not when I try this (as discussed above)
The capability is there but it is disabled by default.

> > > > 
> > > OK.
> > > 
> > > > > +         * ports. ARI Forwarding enable setting at firmware/OS 
> > > > > Control
> > > > handoff.
> > > > 
> > > > 
> > > > can parse last sentence. drop it?
> > > > 
> > > OK.
> > > 
> > > > > +         * If the bit is Set when a non-ARI Device is present, the 
> > > > > non-ARI
> > > > > +           Device can respond to Configuration Space accesses under
> > > > what it
> > > > > +         * interprets as being different Device Numbers, and its 
> > > > > Functions
> > > > can
> > > > > +         * be aliased under multiple Device Numbers, generally 
> > > > > leading to
> > > > > +         * undesired behavior.

I don't understand how this is possible to achieve.

Knut

> > > > I don't think any badness really happens.
> > > > Did you observe any?
> > > > 
> > > Same with above explanation.
> > > 
> > > > 
> > > > 
> > > > > +         */
> > > > > +        port_type = pcie_cap_get_type(pci_dev);
> > > > > +        if (port_type == PCI_EXP_TYPE_DOWNSTREAM ||
> > > > > +            port_type == PCI_EXP_TYPE_ROOT_PORT) {
> > > > > +            if (!pci_is_express(dev) ||
> > > > > +                (pci_is_express(dev) &&
> > > > > +                !pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI))) {
> > > > 
> > > > I would just skip the test for a non express function.
> > > > 
> > > Sorry?
> > > 
> > > Best regards,
> > > -Gonglei
> > > 
> > > > > +                if (PCI_SLOT(dev->devfn) != 0) {
> > > > > +                    error_report("PCIe: non-ARI device can't be
> > > > populated"
> > > > > +                                 " in slot %d",
> > > > PCI_SLOT(dev->devfn));
> > > > > +                    return -1;
> > > > > +                }
> > > > > +            }
> > > > > +        }
> > > > > +    }
> > > > > +
> > > > > +    return 0;
> > > > > +}
> > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > > index d139d58..1d8f3f4 100644
> > > > > --- a/include/hw/pci/pcie.h
> > > > > +++ b/include/hw/pci/pcie.h
> > > > > @@ -115,6 +115,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > > >                           uint16_t offset, uint16_t size);
> > > > >
> > > > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
> > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev);
> > > > >
> > > > >  extern const VMStateDescription vmstate_pcie_device;
> > > > >
> > > > > --
> > > > > 1.7.12.4
> > > > >
> > > 
> > > 
> > 
> > 
> > 
> 
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]