Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links

From:	Alex Williamson
Subject:	Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links
Date:	Mon, 25 Mar 2013 11:59:45 -0600

On Sun, 2013-03-24 at 11:14 +0200, Michael S. Tsirkin wrote:
> On Fri, Mar 22, 2013 at 09:25:13AM -0600, Alex Williamson wrote:
> > On Thu, 2013-03-21 at 18:56 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Mar 19, 2013 at 04:24:47PM -0600, Alex Williamson wrote:
> > > > Enable PCIe devices to negotiate links.  This upgrades our root ports
> > > > and switches to advertising x16, 8.0GT/s and negotiates the current
> > > > link status to the best available width and speed.  Note that we also
> > > > skip setting link fields for Root Complex Integrated Endpoints as
> > > > indicated by the PCIe spec.
> > > > 
> > > > Signed-off-by: Alex Williamson <address@hidden>
> > > > ---
> > > > 
> > > > This depends on the previous pcie_endpoint_cap_init patch.
> > > > 
> > > >  hw/ioh3420.c            |    5 +-
> > > >  hw/pci/pcie.c           |  150 
> > > > ++++++++++++++++++++++++++++++++++++++++++++---
> > > >  hw/pci/pcie.h           |    7 ++
> > > >  hw/pci/pcie_regs.h      |   17 +++++
> > > >  hw/usb/hcd-xhci.c       |    3 +
> > > >  hw/xio3130_downstream.c |    4 +
> > > >  hw/xio3130_upstream.c   |    4 +
> > > >  7 files changed, 173 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/hw/ioh3420.c b/hw/ioh3420.c
> > > > index 5cff61e..0aaec5b 100644
> > > > --- a/hw/ioh3420.c
> > > > +++ b/hw/ioh3420.c
> > > > @@ -115,7 +115,10 @@ static int ioh3420_initfn(PCIDevice *d)
> > > >      if (rc < 0) {
> > > >          goto err_bridge;
> > > >      }
> > > > -    rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, 
> > > > p->port);
> > > > +
> > > > +    /* Real hardware only supports up to x4, 2.5GT/s, but we're not 
> > > > real hw */
> > > > +    rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, 
> > > > p->port,
> > > > +                       PCI_EXP_LNK_MLW_16, PCI_EXP_LNK_LS_80);
> > > >      if (rc < 0) {
> > > >          goto err_msi;
> > > >
> > > I'd like to see some documentation about why this is needed/a good idea.
> > > You could argue that some guest might be surprised if the card width
> > > does not match reality but it could work in reverse too ...
> > > Do you see some drivers complaining? Linux prints warnings sometimes -
> > > is this what worries you?
> > > Let's document the motivation here not only what's going on.
> > 
> > Ok, I can add motivation.  This is where I really wish we had a generic
> > switch that wouldn't risk having existing real world expectations.  The
> > base motivation though is to not create artificial bottlenecks.  If I
> > assign a graphics card to a guest, I want the link to negotiate to the
> > same bandwidth it has on the host.  I'm not entirely sure how to do that
> > yet, whether I should cap the device capabilities to it's current status
> > or whether I should force a negotiation at the current speed.  The
> > former may confuse drivers if they expect certain device capabilities,
> > the latter may cause drivers to attempt to renegotiate higher speeds.
> > The goal though is to have a virtual platform that advertises sufficient
> > speed on all ports to attach any real or virtual device.
> > 
> > Perhaps I should stick to hardware limitations for xio3130 & io3420 and
> > distill these drivers down to generic ones with x32 ports and 8GT/s.
> 
> Hmm this doesn't actually answer the question ...  Why does it matter
> what is the width negotiated by the guest?

Why does the Windows virtio-net driver report running at 10Gbps?  We
want speeds that make sense and don't impose artificial bottlenecks.
Additionally, we don't know what kinds of problems could result from an
inconsistent topology.  For example, if an assigned device reports a
link status of x16, 5GT/s and the root port reports status and
capability of x1, 2.5GT/s, what is the guest supposed to believe?  By
allowing root ports and switches to report the superset of all possible
devices we can virtually downshift to correctly report any device.
Should we wait to see if these problems materialize, or fix them
proactively?

I do concede that doing this on devices emulating physical hardware can
possibly lead to other incompatibilities as a chip specific driver for
the root port or switch might know these capabilities aren't physically
possible.  So I can limit the default to match physical hardware.

[snip]
> > > So I see this in spec:
> > > 7.8.19.Link Control 2 Register (Offset 30h)
> > > 
> > > 
> > >     9
> > >     Hardware Autonomous Width Disable – When Set, this bit
> > >     disables hardware from changing the Link width for reasons
> > >     other than attempting to correct unreliable Link operation by
> > >     reducing Link width.
> > >     For a Multi-Function device associated with an Upstream Port,
> > >     the bit in Function 0 is of type RW, and only Function 0 controls
> > >     the component’s Link behavior. In all other Functions of that
> > >     device, this bit is of type RsvdP.
> > >     Components that do not implement the ability autonomously to
> > >     change Link width are permitted to hardwire this bit to 0b.
> > >     Default value of this bit is 0b.
> > >     RW/RsvdP
> > >     (see
> > >     description)
> > > 
> > > So far we did not implement this according to rules:
> > > we always used the 1x width so not ability to change
> > > width.
> > > 
> > > Now that we do, we need to support the override?
> > > 
> > > Did not look but something like this could exist for speed too?
> > 
> > We should definitely only have function 0 perform the link negotiation
> > with the upstream port, subsequent functions can just copy the result
> > from function 0.  Am I still missing something or is that sufficient?
> 
> We would also need to implement the bit above.

Why?  Components are allowed to not implement it, which is what we do by
leaving it set it 0b.  For device assignment maybe we'll want to let a
write of that bit pass to hardware.

> If we ever implement error reporting, we might
> see devices reduce the width automatically
> and need to implement that.

And yes, assigned devices may change link speed and I expect as we
increase AER support we'll be able to notify for things like that.
Emulated devices have no reason to change link speed, so what I'm trying
for here is to get a proper initial link negotiation.

> All of this is bypassed if we simply report x1 to guest ...

So the argument is that x1 is easy?  I've never known that argument to
be very powerful ;)

> > > > +    /* Choose the highest speed supported by both devices */
> > > > +    speed = ffs(pow2floor(pcie_link_speed_mask(dev) &
> > > > +                          pcie_link_speed_mask(parent)));
> > > 
> > > What's all this trickery about? Not same as fls by chance?
> > 
> > Yes, I couldn't find fls, I'll see if the compiler can.  This rounds
> > down to the highest power of 2 then finds the first (only) set bit.
> 
> It's in include/qemu-common.h

Thanks, found it: qemu_fls.  Thanks,

Alex

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH] pcie: Enhance PCIe links, Alex Williamson, 2013/03/19
- Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links, Michael S. Tsirkin, 2013/03/21
  - Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links, Alex Williamson, 2013/03/22
    - Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links, Michael S. Tsirkin, 2013/03/24
    - Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links, Alex Williamson <=

Prev by Date: [Qemu-devel] [PATCH 19/19] qcow2: Gather clusters in a looping loop
Next by Date: Re: [Qemu-devel] [Qemu-ppc] [RFC ppc-next PATCH 3/6] memory: add memory_region_to_address()
Previous by thread: Re: [Qemu-devel] [PATCH] pcie: Enhance PCIe links
Next by thread: [Qemu-devel] Requesting a wiki account.
Index(es):
- Date
- Thread