qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links
Date: Thu, 21 Mar 2013 18:56:04 +0200

On Tue, Mar 19, 2013 at 04:24:47PM -0600, Alex Williamson wrote:
> Enable PCIe devices to negotiate links.  This upgrades our root ports
> and switches to advertising x16, 8.0GT/s and negotiates the current
> link status to the best available width and speed.  Note that we also
> skip setting link fields for Root Complex Integrated Endpoints as
> indicated by the PCIe spec.
> 
> Signed-off-by: Alex Williamson <address@hidden>
> ---
> 
> This depends on the previous pcie_endpoint_cap_init patch.
> 
>  hw/ioh3420.c            |    5 +-
>  hw/pci/pcie.c           |  150 
> ++++++++++++++++++++++++++++++++++++++++++++---
>  hw/pci/pcie.h           |    7 ++
>  hw/pci/pcie_regs.h      |   17 +++++
>  hw/usb/hcd-xhci.c       |    3 +
>  hw/xio3130_downstream.c |    4 +
>  hw/xio3130_upstream.c   |    4 +
>  7 files changed, 173 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ioh3420.c b/hw/ioh3420.c
> index 5cff61e..0aaec5b 100644
> --- a/hw/ioh3420.c
> +++ b/hw/ioh3420.c
> @@ -115,7 +115,10 @@ static int ioh3420_initfn(PCIDevice *d)
>      if (rc < 0) {
>          goto err_bridge;
>      }
> -    rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, 
> p->port);
> +
> +    /* Real hardware only supports up to x4, 2.5GT/s, but we're not real hw 
> */
> +    rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, p->port,
> +                       PCI_EXP_LNK_MLW_16, PCI_EXP_LNK_LS_80);
>      if (rc < 0) {
>          goto err_msi;
>
I'd like to see some documentation about why this is needed/a good idea.
You could argue that some guest might be surprised if the card width
does not match reality but it could work in reverse too ...
Do you see some drivers complaining? Linux prints warnings sometimes -
is this what worries you?
Let's document the motivation here not only what's going on.

> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 62bd0b8..c07d3cc 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -37,11 +37,98 @@
>  #define PCIE_DEV_PRINTF(dev, fmt, ...)                                  \
>      PCIE_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
>  
> +static uint16_t pcie_link_max_width(PCIDevice *dev)
> +{
> +    uint8_t *exp_cap;
> +    uint32_t lnkcap;
> +
> +    exp_cap = dev->config + dev->exp.exp_cap;
> +    lnkcap = pci_get_long(exp_cap + PCI_EXP_LNKCAP);
> +
> +    return lnkcap & PCI_EXP_LNKCAP_MLW;
> +}
> +
> +static uint8_t pcie_link_speed_mask(PCIDevice *dev)
> +{
> +    uint8_t *exp_cap;
> +    uint32_t lnkcap, lnkcap2;
> +    uint8_t speeds, mask;
> +
> +    exp_cap = dev->config + dev->exp.exp_cap;
> +    lnkcap = pci_get_long(exp_cap + PCI_EXP_LNKCAP);
> +    lnkcap2 = pci_get_long(exp_cap + PCI_EXP_LNKCAP2);
> +
> +    mask = (1 << (lnkcap & PCI_EXP_LNKCAP_SLS)) - 1;
> +
> +    /*
> +     * If LNKCAP2 reports supported link speeds, then LNKCAP indexes
> +     * the highest supported speed.  Mask out the rest and return.
> +     */
> +    speeds = (lnkcap2 & 0xfe) >> 1;

what's 0xfe?

> +    if (speeds) {
> +        return speeds & mask;
> +    }
> +
> +    /*
> +     * Otherwise LNKCAP returns the maximum speed and the device supports
> +     * all speeds below it.  This is really only valid for 2.5 & 5GT/s
> +     */
> +    return mask;
> +}
> +
> +void pcie_negotiate_link(PCIDevice *dev)
> +{
> +    PCIDevice *parent;
> +    uint16_t flags, width;
> +    uint8_t type, speed;
> +
> +    /* Skip non-express buses and Root Complex buses. */
> +    if (!pci_bus_is_express(dev->bus) || pci_bus_is_root(dev->bus)) {
> +        return;
> +    }
> +
> +    /*
> +     * Downstream ports don't negotiate with upstream ports, their link
> +     * is negotiated by whatever is attached downstream to them.  The
> +     * same is true of root ports, but root ports are always attached to
> +     * the root complex, so fall out above.
> +     */
> +    flags = pci_get_word(dev->config + dev->exp.exp_cap + PCI_EXP_FLAGS);
> +    type = (flags & PCI_EXP_FLAGS_TYPE) >> PCI_EXP_FLAGS_TYPE_SHIFT;
> +    if (type == PCI_EXP_TYPE_DOWNSTREAM) {
> +        return;
> +    }
> +
> +    parent = dev->bus->parent_dev;
> +
> +    assert(pci_is_express(dev) && dev->exp.exp_cap &&
> +           pci_is_express(parent) && parent->exp.exp_cap);
> +
> +    /* Devices support all widths below max, so use the MIN max */
> +    width = MIN(pcie_link_max_width(dev), pcie_link_max_width(parent));


So I see this in spec:
7.8.19.Link Control 2 Register (Offset 30h)


    9
    Hardware Autonomous Width Disable – When Set, this bit
    disables hardware from changing the Link width for reasons
    other than attempting to correct unreliable Link operation by
    reducing Link width.
    For a Multi-Function device associated with an Upstream Port,
    the bit in Function 0 is of type RW, and only Function 0 controls
    the component’s Link behavior. In all other Functions of that
    device, this bit is of type RsvdP.
    Components that do not implement the ability autonomously to
    change Link width are permitted to hardwire this bit to 0b.
    Default value of this bit is 0b.
    RW/RsvdP
    (see
    description)

So far we did not implement this according to rules:
we always used the 1x width so not ability to change
width.

Now that we do, we need to support the override?

Did not look but something like this could exist for speed too?

> +    /* Choose the highest speed supported by both devices */
> +    speed = ffs(pow2floor(pcie_link_speed_mask(dev) &
> +                          pcie_link_speed_mask(parent)));

What's all this trickery about? Not same as fls by chance?

> +
> +    pci_word_test_and_clear_mask(dev->config + dev->exp.exp_cap +
> +                                 PCI_EXP_LNKSTA,
> +                                 PCI_EXP_LNKCAP_MLW | PCI_EXP_LNKCAP_SLS);
> +    pci_word_test_and_set_mask(dev->config + dev->exp.exp_cap + 
> PCI_EXP_LNKSTA,
> +                               width | speed);

This looks strange. Should not be pci_set_word_by_mask,
since speed is not a mask itself?

> +
> +    pci_word_test_and_clear_mask(parent->config + parent->exp.exp_cap +
> +                                 PCI_EXP_LNKSTA,
> +                                 PCI_EXP_LNKCAP_MLW | PCI_EXP_LNKCAP_SLS);
> +    pci_word_test_and_set_mask(parent->config + parent->exp.exp_cap +
> +                               PCI_EXP_LNKSTA,
> +                               width | speed);
> +}
>  
>  /***************************************************************************
>   * pci express capability helper functions
>   */
> -int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
> +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port,
> +                  uint16_t width, uint8_t speed)
>  {
>      int pos;
>      uint8_t *exp_cap;
> @@ -71,14 +158,56 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, 
> uint8_t type, uint8_t port)
>       */
>      pci_set_long(exp_cap + PCI_EXP_DEVCAP, PCI_EXP_DEVCAP_RBER);
>  
> -    pci_set_long(exp_cap + PCI_EXP_LNKCAP,
> -                 (port << PCI_EXP_LNKCAP_PN_SHIFT) |
> -                 PCI_EXP_LNKCAP_ASPMS_0S |
> -                 PCI_EXP_LNK_MLW_1 |
> -                 PCI_EXP_LNK_LS_25);
> +    /* Root Complex devices don't report link fields */
> +    if (type != PCI_EXP_TYPE_RC_END) {
> +        /* User specifies the link port # */
> +        pci_set_long(exp_cap + PCI_EXP_LNKCAP, port << 
> PCI_EXP_LNKCAP_PN_SHIFT);
> +
> +        pci_long_test_and_set_mask(exp_cap + PCI_EXP_LNKCAP, width | speed);
> +
> +        /* Advertise L0s ASPM support */
> +        pci_long_test_and_set_mask(exp_cap + PCI_EXP_LNKCAP,
> +                                   PCI_EXP_LNKCAP_ASPMS_L0S);
>  
> -    pci_set_word(exp_cap + PCI_EXP_LNKSTA,
> -                 PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25);
> +        /*
> +         * Link bandwidth notification is required for all root ports and
> +         * downstream ports supporting links wider than x1.
> +         */
> +        if (width > PCI_EXP_LNK_MLW_1 && (type == PCI_EXP_TYPE_ROOT_PORT ||
> +            type == PCI_EXP_TYPE_DOWNSTREAM)) {
> +            pci_long_test_and_set_mask(exp_cap + PCI_EXP_LNKCAP,
> +                                       PCI_EXP_LNKCAP_LBNC);
> +        }
> +
> +        /* 8.0GT/s adds some requirements */
> +        if (speed >= PCI_EXP_LNK_LS_80) {
> +            /*
> +             * 2.5 & 5.0GT/s can be fully described by LNKCAP, but 8.0GT/s
> +             * is actually a reference to the highest bit supported in this
> +             * register.  We assume the device supports all link speeds.
> +             */
> +            pci_long_test_and_set_mask(exp_cap + PCI_EXP_LNKCAP2,
> +                                       PCI_EXP_LNK2_LS_25);
> +            pci_long_test_and_set_mask(exp_cap + PCI_EXP_LNKCAP2,
> +                                       PCI_EXP_LNK2_LS_50);
> +            pci_long_test_and_set_mask(exp_cap + PCI_EXP_LNKCAP2,
> +                                       PCI_EXP_LNK2_LS_80);

Let's just do a | here.

> +
> +            /*
> +             * Supporting 8.0GT/s requires that we advertise Data Link Layer
> +             * Active on all downstream ports supporting hotplug or speeds
> +             * great than 5GT/s

typo

> +             */
> +            if (type == PCI_EXP_TYPE_DOWNSTREAM) {
> +                pci_long_test_and_set_mask(exp_cap + PCI_EXP_LNKCAP,
> +                                           PCI_EXP_LNKCAP_DLLLARC);
> +                pci_word_test_and_set_mask(exp_cap + PCI_EXP_LNKSTA,
> +                                           PCI_EXP_LNKSTA_DLLLA);
> +            }
> +        }
> +
> +        pcie_negotiate_link(dev);
> +    }
>  
>      pci_set_long(exp_cap + PCI_EXP_DEVCAP2,
>                   PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP);
> @@ -87,7 +216,8 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
> type, uint8_t port)
>      return pos;
>  }
>  
> -int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
> +int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset, uint16_t width,
> +                           uint8_t speed)
>  {
>      uint8_t type = PCI_EXP_TYPE_ENDPOINT;
>  
> @@ -100,7 +230,7 @@ int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
>          type = PCI_EXP_TYPE_RC_END;
>      }
>  
> -    return pcie_cap_init(dev, offset, type, 0);
> +    return pcie_cap_init(dev, offset, type, 0, width, speed);
>  }
>  
>  void pcie_cap_exit(PCIDevice *dev)
> diff --git a/hw/pci/pcie.h b/hw/pci/pcie.h
> index c010007..051dd76 100644
> --- a/hw/pci/pcie.h
> +++ b/hw/pci/pcie.h
> @@ -94,9 +94,12 @@ struct PCIExpressDevice {
>  };
>  
>  /* PCI express capability helper functions */
> -int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t 
> port);
> -int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset);
> +int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port,
> +                  uint16_t width, uint8_t speed);
> +int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset, uint16_t width,
> +                           uint8_t speed);
>  void pcie_cap_exit(PCIDevice *dev);
> +void pcie_negotiate_link(PCIDevice *dev);
>  uint8_t pcie_cap_get_type(const PCIDevice *dev);
>  void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t vector);
>  uint8_t pcie_cap_flags_get_vector(PCIDevice *dev);
> diff --git a/hw/pci/pcie_regs.h b/hw/pci/pcie_regs.h
> index 4d123d9..4078451 100644
> --- a/hw/pci/pcie_regs.h
> +++ b/hw/pci/pcie_regs.h
> @@ -34,13 +34,23 @@
>  /* PCI_EXP_LINK{CAP, STA} */
>  /* link speed */
>  #define PCI_EXP_LNK_LS_25               1
> +#define PCI_EXP_LNK_LS_50               2
> +#define PCI_EXP_LNK_LS_80               3
>  
>  #define PCI_EXP_LNK_MLW_SHIFT           (ffs(PCI_EXP_LNKCAP_MLW) - 1)
>  #define PCI_EXP_LNK_MLW_1               (1 << PCI_EXP_LNK_MLW_SHIFT)
> +#define PCI_EXP_LNK_MLW_2               (2 << PCI_EXP_LNK_MLW_SHIFT)
> +#define PCI_EXP_LNK_MLW_4               (4 << PCI_EXP_LNK_MLW_SHIFT)
> +#define PCI_EXP_LNK_MLW_8               (8 << PCI_EXP_LNK_MLW_SHIFT)
> +#define PCI_EXP_LNK_MLW_12              (12 << PCI_EXP_LNK_MLW_SHIFT)
> +#define PCI_EXP_LNK_MLW_16              (16 << PCI_EXP_LNK_MLW_SHIFT)
> +#define PCI_EXP_LNK_MLW_32              (32 << PCI_EXP_LNK_MLW_SHIFT)
>  
>  /* PCI_EXP_LINKCAP */
>  #define PCI_EXP_LNKCAP_ASPMS_SHIFT      (ffs(PCI_EXP_LNKCAP_ASPMS) - 1)
> -#define PCI_EXP_LNKCAP_ASPMS_0S         (1 << PCI_EXP_LNKCAP_ASPMS_SHIFT)
> +#define PCI_EXP_LNKCAP_ASPMS_L0S        (1 << PCI_EXP_LNKCAP_ASPMS_SHIFT)
> +#define PCI_EXP_LNKCAP_ASPMS_L1         (2 << PCI_EXP_LNKCAP_ASPMS_SHIFT)
> +#define PCI_EXP_LNKCAP_ASPMS_L0SL1      (3 << PCI_EXP_LNKCAP_ASPMS_SHIFT)
>  
>  #define PCI_EXP_LNKCAP_PN_SHIFT         (ffs(PCI_EXP_LNKCAP_PN) - 1)
>  
> @@ -72,6 +82,11 @@
>  
>  #define PCI_EXP_DEVCTL2_EETLPPB         0x80
>  
> +#define PCI_EXP_LNKCAP2         44      /* Link Capabilities 2 */
> +#define PCI_EXP_LNK2_LS_25              (1 << 1)
> +#define PCI_EXP_LNK2_LS_50              (1 << 2)
> +#define PCI_EXP_LNK2_LS_80              (1 << 3)
> +
>  /* ARI */
>  #define PCI_ARI_VER                     1
>  #define PCI_ARI_SIZEOF                  8
> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> index 5aa342b..5f57807 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -3332,7 +3332,8 @@ static int usb_xhci_initfn(struct PCIDevice *dev)
>                       
> PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64,
>                       &xhci->mem);
>  
> -    ret = pcie_endpoint_cap_init(&xhci->pci_dev, 0xa0);
> +    ret = pcie_endpoint_cap_init(&xhci->pci_dev, 0xa0, PCI_EXP_LNK_MLW_1,
> +                                 PCI_EXP_LNK_LS_25);
>      assert(ret >= 0);
>  
>      if (xhci->flags & (1 << XHCI_FLAG_USE_MSI)) {
> diff --git a/hw/xio3130_downstream.c b/hw/xio3130_downstream.c
> index b868f56..086ada9 100644
> --- a/hw/xio3130_downstream.c
> +++ b/hw/xio3130_downstream.c
> @@ -79,8 +79,10 @@ static int xio3130_downstream_initfn(PCIDevice *d)
>      if (rc < 0) {
>          goto err_bridge;
>      }
> +
> +    /* Real hardware only supports x1, 2.5GT/s, but we're not real hw */

add "So set it to 8.0 GT/s because ... "

>      rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
> -                       p->port);
> +                       p->port, PCI_EXP_LNK_MLW_16, PCI_EXP_LNK_LS_80);
>      if (rc < 0) {
>          goto err_msi;
>      }
> diff --git a/hw/xio3130_upstream.c b/hw/xio3130_upstream.c
> index cd5d97d..4b6820b 100644
> --- a/hw/xio3130_upstream.c
> +++ b/hw/xio3130_upstream.c
> @@ -75,8 +75,10 @@ static int xio3130_upstream_initfn(PCIDevice *d)
>      if (rc < 0) {
>          goto err_bridge;
>      }
> +
> +    /* Real hardware only supports x1, 2.5GT/s, but we're not real hw */
>      rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
> -                       p->port);
> +                       p->port, PCI_EXP_LNK_MLW_16, PCI_EXP_LNK_LS_80);
>      if (rc < 0) {
>          goto err_msi;
>      }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]