qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of pc


From: Knut Omang
Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of pcie devices
Date: Fri, 03 Oct 2014 16:30:49 +0200

On Fri, 2014-10-03 at 13:22 +0200, Knut Omang wrote:
> On Wed, 2014-10-01 at 17:08 +0300, Marcel Apfelbaum wrote:
> > On Wed, 2014-10-01 at 07:26 +0200, Knut Omang wrote:
> > > On Tue, 2014-09-30 at 21:38 +0800, Gonglei wrote:
> > > > > Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari 
> > > > > capability of
> > > > > pcie devices
> > > > > 
> > > > > On Tue, Sep 30, 2014 at 06:11:25PM +0800, address@hidden wrote:
> > > > > > From: Gonglei <address@hidden>
> > > > > >
> > > > > > In QEMU, ARI Forwarding is enabled default at emulation of PCIe
> > > > > > ports. ARI Forwarding enable setting at firmware/OS Control handoff.
> > > > > > If the bit is Set when a non-ARI Device is present, the non-ARI
> > > > > > Device can respond to Configuration Space accesses under what it
> > > > > > interprets as being different Device Numbers, and its Functions can
> > > > > > be aliased under multiple Device Numbers, generally leading to
> > > > > > undesired behavior.
> > > > > 
> > > > > So what is the undesired behaviour?
> > > > > Did you observe such?
> > > > 
> > > > I just observe the PCI device don't work, and the dmesg show me:
> > > > 
> > > > [ 159.035250] pciehp 0000:05:00.0:pcie24: Button pressed on Slot (0 - 4)
> > > > [ 159.035274] pciehp 0000:05:00.0:pcie24: Card present on Slot (0 - 4)
> > > > [ 159.036517] pciehp 0000:05:00.0:pcie24: PCI slot #0 - 4 - powering on 
> > > > due to button press.
> > > > [ 159.188049] pciehp 0000:05:00.0:pcie24: Failed to check link status
> > > > [ 159.201968] pciehp 0000:05:00.0:pcie24: Card not present on Slot (0 - 
> > > > 4)
> > > > [ 159.202529] pciehp 0000:05:00.0:pcie24: Already disabled on Slot (0 - 
> > > > 4)
> > > 
> > > This seems very much like the symptoms I see when I use hotplug after
> > > the latest changes to the hotplug code - do you have something that
> > > confirms this has something to do with wrong interpretation of device
> > > IDs? My suspicion is that something has gone wrong or is missing in the
> > > hotplug logic but I havent had time to dig deeper into it yet.
> > Can you please describe me the steps to reproduce the issue?
> 
> Hmm, while trying to reproduce again I realize my hotplug issues are not
> the same as the one Gonglei reports, let me come back to that in a
> separate mail later,
> 
> My main point here is that I don't see how this particular fix would
> alleviate Gonglei's issue, as it does not seem to get triggered unless
> there's a bug in the emulated port/switch?
> 
> Gonglei, I assume you are use the TI x3130 in your example:
> IMHO any PCIe x 1 device should fit into any of the (3) PCIe slots
> provided by the TI x3130, and according to the spec:
> 
> http://www.ti.com/lit/gpn/xio3130
> 
> these slots appear as separate buses, which means that all devices will
> have devfn 0.0 but on different buses, eg. if you have all three switch
> slots (not to be confused with the slot number given by the PCI_SLOT()
> macro) populated and no other secondary buses before it, you should see
> the downstream switch ports as 01:xx:x and devices in 02:00.0, 03:00.0
> and 04:00.0 for slots 0, 1 and 2. This seems to correspond well with the
> current Qemu model.
> 
> So unless your device itself exposes function# > 8 and is not ARI
> capable (which would be a non-compliant device as far as I read the
> standard) you should never be able to see any device in any of the
> downstream ports have PCI_SLOT(devfn) != 0 ?
> 
> With qemu master + my SR/IOV patch set + the igb patch (just to have an
> ARI capable device to play with) here:
> 
> https://github.com/knuto/qemu, branch sriov_patches_v3
> 
> and a guest running F20 I am able to boot with a device (such as for
> instance an e1000) inserted in a root port, I am also able to hot plug
> it from the qemu monitor, eg.
> 
> qemu-kvm  ... \
>   -device ioh3420,slot=0,id=pcie_port.0
> 
> ...
> 
> (qemu) device_add e1000,vlan=1,bus=pcie_port.0,id=ne
> (qemu) device_del ne
> 
> In both cases the ARIfwd bit is disabled by default and not enabled by
> the port driver (just as I would expect) so at least your comment 
> is wrong as far as I can see.
> 
> Booting with a two-port downstream switch with no devices plugged, I see
> the same, ARIfwd is not enabled on the downstream ports as long as no
> device is in any slots, which is as expected, isn't it?
> 
> qemu-kvm  ... \
>   -device x3130-upstream,id=upsw \
>   -device xio3130-downstream,bus=upsw,addr=0.0,chassis=5,id=ds_port.0 \
>   -device xio3130-downstream,bus=upsw,addr=1.0,chassis=6,id=ds_port.1
> ...
> 
> The upstream port does not support ARIfwd in the QEMU model, which I
> suppose is correct as it will only contain individual downstream
> devices, which expose new buses but never have more than one function.
> 
> However, either booting or hotplugging an e1000 into the downstream
> switch, eg.
> 
> (qemu) device_add e1000,vlan=1,bus=ds_port.0,id=ne
> 
> with my current guest image kernel 3.13.10-200.fc20.x86_64+debug I get a
> NULL pointer Oops in the guest,
> 
> [    0.520009] BUG: unable to handle kernel NULL pointer dereference at 
> 0000000000000038
> [    0.521000] IP: [<ffffffff813d942d>] pcie_aspm_init_link_state+0x6ed/0x7c0
> [    0.521000] PGD 0 
> [    0.521000] Oops: 0000 [#1] SMP 

here is the gdb version of the stack trace:

0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at 
drivers/pci/pcie/aspm.c:530
530                     parent = pdev->bus->parent->self->link_state;
(gdb) where
#0  0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at 
drivers/pci/pcie/aspm.c:530
#1  pcie_aspm_init_link_state (pdev=0xffff880178cec520) at 
drivers/pci/pcie/aspm.c:578
#2  0xffffffff813c6bbd in pci_scan_slot (address@hidden, address@hidden)
    at drivers/pci/probe.c:1486
#3  0xffffffff813e0d54 in pciehp_configure_device (address@hidden)
    at drivers/pci/hotplug/pciehp_pci.c:54
#4  0xffffffff813e05cb in board_added (p_slot=0xffff880173982760) at 
drivers/pci/hotplug/pciehp_ctrl.c:223
#5  pciehp_enable_slot (address@hidden) at drivers/pci/hotplug/pciehp_ctrl.c:510
#6  0xffffffff813e0ac0 in pciehp_power_thread (work=<optimized out>) at 
drivers/pci/hotplug/pciehp_ctrl.c:308
#7  0xffffffff8109bc31 in process_one_work (address@hidden, 
work=0xffff8801731bbc30)
    at kernel/workqueue.c:2214
#8  0xffffffff8109c1eb in worker_thread (__worker=0xffff8801730fd728) at 
kernel/workqueue.c:2340
#9  0xffffffff810a43ef in kthread (_create=0xffff88017336edc8) at 
kernel/kthread.c:207
#10 <signal handler called>
#11 0x0000000000000000 in irq_stack_union ()
#12 0x0000000000000000 in ?? ()
(gdb) p pdev->bus->parent->self
$2 = (struct pci_dev *) 0x0 <irq_stack_union>                                   
                                      
(gdb)                                        

I observe the same behaviour with guest kernel 3.16.3-200.fc20.x86_64.

Knut

> The exact same happens if I use the ARI capable igb device instead.
> 
> I will upgrade the guest to a newer image (what kernel are you running with, 
> Gonglei?) 
> 
> lspci reports this on the guest (before I try to add any device in the 
> downstream port(s)):
> 
> 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM 
> Controller
> 00:01.0 VGA compatible controller: Cirrus Logic GD 5446
> 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
> 00:03.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem
> 00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express 
> Root Port 0 (rev 02)
> 00:05.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) 
> (rev 02)
> 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller 
> (rev 02)
> 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port 
> SATA Controller [AHCI mode] (rev 02)
> 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 
> 02)
> 02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) 
> (rev 01)
> 02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) 
> (rev 01)
> 
> # lspci -t
> -[0000:00]-+-00.0
>            +-01.0
>            +-02.0
>            +-03.0
>            +-04.0-[01]--
>            +-05.0-[02-04]--+-00.0-[03]--
>            |               \-01.0-[04]--
>            +-1f.0
>            +-1f.2
>            \-1f.3
> 
> eg. the devices plugged into the downstream ports will end up at 03:00.0 and 
> 04:00.0,
> the info qtree command verifies that the e1000 insert ends up in 03:00.0 as 
> expected.
> 
> Similar if I instead use 
> 
> (qemu) device_add e1000,vlan=1,bus=ds_port.1,id=ne2
> 
> I see this with info qtree:
> 
>   class Ethernet controller, addr 04:00.0, pci id 8086:100e (sub 1af4:1100) 
> 
> Anyway I suppose this indicates that something is not right/is missing
> with the downstream switch emulation, and not with the (generic) way the
> ARIfwd cap/ctrl bit works
> 
> > It should work correctly by now, maybe a "surprise removal/addition"
> > warning and nothing more.
> > 
> > Thank you,
> > Marcel
> > 
> > > 
> > > I am just concerned that this might alleviate the symptoms you see but
> > > not fix the problem itself,
> > > 
> > > Knut
> > > 
> > > > > Please make this explicit in the commit log.
> > > > > 
> > > > Sorry, I copied the description from PCIe spec. :(
> > > > 
> > > > IMPLEMENTATION NOTE at Page 19:
> > > > 
> > > > https://www.pcisig.com/specifications/pciexpress/specifications/ECN-alt-rid-interpretation-070604.pdf
> > > > 
> > > > 
> > > > > 
> > > > > >
> > > > > > So, for PCI devices attached in PCIe root ports or downstream pots,
> > > > > > we should assure that its slot is not non-zero. 
> 
> 
> 
> > > > > > For PCIe devices, which
> > > > > > ARP capability is not enabled, we also should assure that its slot
> > > > > > is not non-zero.
> > > > > 
> > > > > not non zero => zero
> > > > > 
> > > > OK.
> > > > 
> > > > > >
> > > > > > Signed-off-by: Gonglei <address@hidden>
> > > > > > ---
> > > > > >  hw/pci/pci.c          |  4 ++++
> > > > > >  hw/pci/pcie.c         | 51
> > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  include/hw/pci/pcie.h |  1 +
> > > > > >  3 files changed, 56 insertions(+)
> > > > > >
> > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > > index 6ce75aa..22b7ca0 100644
> > > > > > --- a/hw/pci/pci.c
> > > > > > +++ b/hw/pci/pci.c
> > > > > > @@ -1770,6 +1770,10 @@ static int pci_qdev_init(DeviceState *qdev)
> > > > > >          }
> > > > > >      }
> > > > > >
> > > > > > +    if (pcie_cap_slot_check(bus, pci_dev)) {
> > > > > > +        return -1;
> > > > > > +    }
> > > > > > +
> > > > > >      /* rom loading */
> > > > > >      is_default_rom = false;
> > > > > >      if (pci_dev->romfile == NULL && pc->romfile != NULL) {
> > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > index 1babddf..b82211a 100644
> > > > > > --- a/hw/pci/pcie.c
> > > > > > +++ b/hw/pci/pcie.c
> > > > > > @@ -633,3 +633,54 @@ void pcie_ari_init(PCIDevice *dev, uint16_t 
> > > > > > offset,
> > > > > uint16_t nextfn)
> > > > > >                          offset, PCI_ARI_SIZEOF);
> > > > > >      pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & 
> > > > > > 0xff) << 8);
> > > > > >  }
> > > > > > +
> > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev)
> > > > > > +{
> > > > > > +    Object *obj = OBJECT(bus);
> > > > > > +
> > > > > > +    if (pci_bus_is_root(bus)) {
> > > > > > +        return 0;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (object_dynamic_cast(obj, TYPE_PCIE_BUS)) {
> > > > > > +        DeviceState *parent = qbus_get_parent(BUS(obj));
> > > > > > +        PCIDevice *pci_dev = PCI_DEVICE(parent);
> > > > > > +        uint8_t port_type;
> > > > > > +        /*
> > > > > > +         * Root ports and downstream ports of switches are the hot
> > > > > > +         * pluggable ports in a PCI Express hierarchy.
> > > > > > +         * PCI Express supports chip-to-chip interconnect, a PCIe 
> > > > > > link can
> > > > > > +         * only connect one PCI device/Switch/EndPoint or 
> > > > > > PCI-bridge.
> > > > > > +         *
> > > > > > +         * 7.3. Configuration Transaction Rules (PCI Express 
> > > > > > specification
> > > > > 3.0)
> > > > > > +         * 7.3.1. Device Number
> > > > > > +         *
> > > > > > +         * Downstream Ports that do not have ARI Forwarding enabled
> > > > > must
> > > > > > +         * associate only Device 0 with the device attached to the 
> > > > > > Logical
> > > > > Bus
> > > > > > +         * representing the Link from the Port.
> > > > > > +         *
> > > > > > +         * In QEMU, ARI Forwarding is enabled default at emulation 
> > > > > > of
> > > > > PCIe
> > > > > 
> > > > > s/enabled default/enabled by default/
> 
> It is not when I try this (as discussed above)
> The capability is there but it is disabled by default.
> 
> > > > > 
> > > > OK.
> > > > 
> > > > > > +         * ports. ARI Forwarding enable setting at firmware/OS 
> > > > > > Control
> > > > > handoff.
> > > > > 
> > > > > 
> > > > > can parse last sentence. drop it?
> > > > > 
> > > > OK.
> > > > 
> > > > > > +         * If the bit is Set when a non-ARI Device is present, the 
> > > > > > non-ARI
> > > > > > +           Device can respond to Configuration Space accesses under
> > > > > what it
> > > > > > +         * interprets as being different Device Numbers, and its 
> > > > > > Functions
> > > > > can
> > > > > > +         * be aliased under multiple Device Numbers, generally 
> > > > > > leading to
> > > > > > +         * undesired behavior.
> 
> I don't understand how this is possible to achieve.
> 
> Knut
> 
> > > > > I don't think any badness really happens.
> > > > > Did you observe any?
> > > > > 
> > > > Same with above explanation.
> > > > 
> > > > > 
> > > > > 
> > > > > > +         */
> > > > > > +        port_type = pcie_cap_get_type(pci_dev);
> > > > > > +        if (port_type == PCI_EXP_TYPE_DOWNSTREAM ||
> > > > > > +            port_type == PCI_EXP_TYPE_ROOT_PORT) {
> > > > > > +            if (!pci_is_express(dev) ||
> > > > > > +                (pci_is_express(dev) &&
> > > > > > +                !pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI))) {
> > > > > 
> > > > > I would just skip the test for a non express function.
> > > > > 
> > > > Sorry?
> > > > 
> > > > Best regards,
> > > > -Gonglei
> > > > 
> > > > > > +                if (PCI_SLOT(dev->devfn) != 0) {
> > > > > > +                    error_report("PCIe: non-ARI device can't be
> > > > > populated"
> > > > > > +                                 " in slot %d",
> > > > > PCI_SLOT(dev->devfn));
> > > > > > +                    return -1;
> > > > > > +                }
> > > > > > +            }
> > > > > > +        }
> > > > > > +    }
> > > > > > +
> > > > > > +    return 0;
> > > > > > +}
> > > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > > > index d139d58..1d8f3f4 100644
> > > > > > --- a/include/hw/pci/pcie.h
> > > > > > +++ b/include/hw/pci/pcie.h
> > > > > > @@ -115,6 +115,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > > > >                           uint16_t offset, uint16_t size);
> > > > > >
> > > > > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t 
> > > > > > nextfn);
> > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev);
> > > > > >
> > > > > >  extern const VMStateDescription vmstate_pcie_device;
> > > > > >
> > > > > > --
> > > > > > 1.7.12.4
> > > > > >
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]