qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 1/2] pcie: Add a simple PCIe ACS (Access Cont


From: Knut Omang
Subject: Re: [Qemu-devel] [PATCH v4 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function
Date: Thu, 14 Feb 2019 17:54:12 +0100
User-agent: Evolution 3.30.4 (3.30.4-1.fc29)

On Thu, 2019-02-14 at 08:51 -0700, Alex Williamson wrote:
> On Thu, 14 Feb 2019 08:07:33 +0100
> Knut Omang <address@hidden> wrote:
> 
> > On Wed, 2019-02-13 at 12:13 -0700, Alex Williamson wrote:
> > > On Wed, 13 Feb 2019 10:29:58 +0100
> > > Knut Omang <address@hidden> wrote:
> > >   
> > > > Add a helper function to add PCIe capability for Access Control Services
> > > > (ACS)  
> > > 
> > > This is redundant to the commit title.
> > >   
> > > > ACS support in the associated root port is needed to pass
> > > > through individual functions of a device to different VMs with VFIO
> > > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > > in the guest.  
> > > 
> > > This is overly subtle, to work backwards that individual functions
> > > (plural!) of a device (singular!) must imply a multifunction endpoint
> > > in the same hierarchy split to different L2 VMs.  Perhaps I only
> > > finally realized this subtly on v4.
> > >   
> > > > Single function devices, or multifunction devices
> > > > that all goes to the same VM works fine even without ACS, as VFIO
> > > > will avoid putting the root port itself into the IOMMU group
> > > > even without ACS support in the port.  
> > > 
> > > Also confusing and incorrectly states that a) VFIO is responsible for
> > > IOMMU grouping, it's not, and b) that the root port would not be
> > > included in such a group, it would.  The latter was I thought the
> > > impetus for this series.  
> > 
> > that wasn't the intention but I can see that it looks that way now
> > 
> > > > Multifunction endpoints may also implement an ACS capability,
> > > > only on function 0, and with more limited features.  
> > > 
> > > "only on function 0" is incorrect, each function of a multifunction
> > > device should (not must) implement an ACS capability if any of them do.
> > > 
> > > Can't we just say something like:
> > > 
> > > "Implementing an ACS capability on downstream ports and multifuction
> > > endpoints indicates isolation and IOMMU visibility to a finer
> > > granularity thereby creating smaller IOMMU groups in the guest and thus
> > > more flexibility in assigning endpoints to guest userspace or an L2
> > > guest."  
> > 
> > sure - will use this - and remove my confusing attempt to 
> > credit to your override patch and VFIO :)
> > 
> > > (I avoided including SR-IOV with multifunction since that's not
> > > implemented here)  
> > 
> > I agree
> > 
> > > > Signed-off-by: Knut Omang <address@hidden>
> > > > ---
> > > >  hw/pci/pcie.c              | 39
> > > > +++++++++++++++++++++++++++++++++++++++-
> > > >  include/hw/pci/pcie.h      |  6 ++++++-
> > > >  include/hw/pci/pcie_regs.h |  4 ++++-
> > > >  3 files changed, 49 insertions(+)
> > > > 
> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > index 230478f..6e87994 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -906,3 +906,42 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> > > >  
> > > >      pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> > > >  }
> > > > +
> > > > +/* ACS (Access Control Services) */
> > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > > +{
> > > > +    bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev),
> > > > TYPE_PCIE_SLOT);  
> > > 
> > > Perhaps we should be using pci_is_express_downstream_port().  
> > 
> > oh - yes - I forgot that we need to look in pci.h for those kind of 
> > helpers..
> > 
> > > > +    uint16_t cap_bits = 0;
> > > > +
> > > > +    /*
> > > > +     * For endpoints, only multifunction devices may have an
> > > > +     * ACS capability, and only on function 0:  
> > > 
> > > Incorrect
> > >   
> > > > +     */
> > > > +    assert(is_pcie_slot ||
> > > > +           ((dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) &&
> > > > +            PCI_FUNC(dev->devfn)));  
> > > 
> > > The second test should be:
> > > 
> > > ((dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) ||
> > >  PCI_FUNC(dev->devfn))
> > > 
> > > If the function number is non-zero, then it's clearly a multifunction
> > > device, the multifunction capability is only required on function
> > > zero.  Just as in my previous example, an ACS capability can only
> > > describe/control the DMA flow of the function implementing it, nothing
> > > in the spec that I can see imposes function zero's DMA flow on the
> > > other functions.  
> > 
> > Ah - of course - that makes sense - 
> > was thinking too complicated here, and also my comment didn't match
> > the code at all..
> > 
> > > There's also a gap here that function zero can set the multifunction
> > > capability, but there may be no secondary devices defined.  Not that
> > > we necessarily need to resolve this, but it's a nuance of allowing
> > > arbitrary multifunction configurations as QEMU does.  
> > 
> > Yes, in the SR/IOV case, at least as I have implemented it in QEMU, 
> > with one PF that would be the default - as no VFs are defined at reset, 
> > there's only one function, but it still need to be multifunction 
> > for QEMU to accept more functions appearing later.
> > 
> > > > +
> > > > +    pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > > > +                        PCI_ACS_SIZEOF);
> > > > +    dev->exp.acs_cap = offset;
> > > > +
> > > > +    if (is_pcie_slot) {
> > > > +        /*
> > > > +         * Endpoints may also implement ACS, and optionally RR and CR,
> > > > +         * if they want to support p2p, but only slots may
> > > > +         * implement SV, TB or UF:  
> > > 
> > > Careful using "may" with spec references.  
> > 
> > :-D
> > 
> > > "Downstream ports must implement SV, TB, RR, CR, and UF (with caveats
> > > on the latter three that we ignore for simplicity).  Endpoints may also
> > > implement a subset of ACS capabilities, but these are optional if the
> > > endpoint does not support peer-to-peer between functions and thus
> > > omitted here."  
> > 
> > Thanks - I'll put that in instead
> > 
> > > > +         */
> > > > +        cap_bits =
> > > > +            PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> > > > PCI_ACS_UF;  
> > > 
> > > PCI_ACS_DT has similar "must be implemented" requirements to RR, RC,
> > > and UF, why is it not included?  For all of these "caveat" ones there
> > > are conditions which can make it optional for root ports, but required
> > > for switch downstream ports, so it seems reasonable that we include
> > > both since that's what our is_pci_slot() test covers.  Thanks,  
> > 
> > That was because my "reference" root ports don't not implement it, taking
> > the
> > conservative approach:
> > 
> > 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root
> > Port 7 (rev 22) (prog-if 00 [Normal decode])
> >        ...
> >        Capabilities: [150 v1] Access Control Services
> >                 ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> >                 ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> > 
> > In fact, all gens of servers I have access to supports just the same cap
> > bits in
> > their root ports, in order of release date. The latest gen even have some
> > root
> > ports without an ACS capability.
> > 
> > 00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root
> > Port 2a (rev 07)
> > 00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon
> > D PCI Express Root Port 1 (rev 01) (prog-if 00 [Normal decode])
> > 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI
> > Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
> > 00:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI
> > Express Root Port 2a (rev 04) (prog-if 00 [Normal decode])
> > 17:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
> > (rev 04) (prog-if 00 [Normal decode])
> > 
> > All of these have DirectTrans- in their ACSCap.
> > 
> > [For reference, the one without ACS cap, in the same server as 17:00.0
> > above:
> > 
> > 00:1c.0 PCI bridge: Intel Corporation Lewisburg PCI Express Root Port #1
> > (rev f9) (prog-if 00 [Normal decode])
> > ]
> > 
> > Do you prefer that we set DT as default anyway, or should we stay within
> > "known
> > territory" for the OSes, at least for now?
> 
> Per the spec:
> 
>   ACS Direct Translated P2P: must be implemented by Root Ports that
>   support Address Translation Services (ATS) and also support
>   peer-to-peer traffic with other Root Ports; must be implemented by
>   Switch Downstream Ports.
> 
> The caveats for root ports here are why your hardware is potentially
> spec compliant without supporting DT.  There are no caveats for switch
> downstream ports, therefore it would not be spec compliant to exclude
> it.  I think your options are either to exclude switch downstream ports
> from the function or to set DT.  Thanks,

Better to set DT then - a future generic switch downstream port would want to
have ACS too.

Knut

> Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]