qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments,


From: Jintack Lim
Subject: Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps
Date: Wed, 22 Feb 2017 06:48:51 -0500

On Tue, Feb 21, 2017 at 10:08 PM, Peter Xu <address@hidden> wrote:

> [cc Jintack]
>
> On Tue, Feb 21, 2017 at 02:43:03PM -0700, Alex Williamson wrote:
> > Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is
> > reserved") removes the internal use of extended capability ID 0, the
> > comment here becomes invalid.  However, peeling back the onion, the
> > code is still correct and we still can't seed the capability chain
> > with ID 0, unless we want to muck with using the version number to
> > force the header to be non-zero, which is much uglier to deal with.
> > The comment also now covers some of the subtleties of using cap ID 0,
> > such as transparently indicating absence of capabilities if none are
> > added.  This doesn't detract from the correctness of the referenced
> > commit as vfio in the kernel also uses capability ID zero to mask
> > capabilties.  In fact, we should skip zero capabilities precisely
> > because the kernel might also expose such a capability at the head
> > position and re-introduce the problem.
> >
> > Signed-off-by: Alex Williamson <address@hidden>
> > Cc: Peter Xu <address@hidden>
> > Cc: Michael S. Tsirkin <address@hidden>
> > ---
> >  hw/vfio/pci.c |   31 +++++++++++++++++++++----------
> >  1 file changed, 21 insertions(+), 10 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index f2ba9b6cfafc..03a3d0154976 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >      /*
> >       * Extended capabilities are chained with each pointing to the
> next, so we
> >       * can drop anything other than the head of the chain simply by
> modifying
> > -     * the previous next pointer.  For the head of the chain, we can
> modify the
> > -     * capability ID to something that cannot match a valid
> capability.  ID
> > -     * 0 is reserved for this since absence of capabilities is
> indicated by
> > -     * 0 for the ID, version, AND next pointer.  However,
> pcie_add_capability()
> > -     * uses ID 0 as reserved for list management and will incorrectly
> match and
> > -     * assert if we attempt to pre-load the head of the chain with this
> ID.
> > -     * Use ID 0xFFFF temporarily since it is also seems to be reserved
> in
> > -     * part for identifying absence of capabilities in a root complex
> register
> > -     * block.  If the ID still exists after adding capabilities, switch
> back to
> > -     * zero.  We'll mark this entire first dword as emulated for this
> purpose.
> > +     * the previous next pointer.  Seed the head of the chain here such
> that
> > +     * we can simply skip any capabilities we want to drop below,
> regardless
> > +     * of their position in the chain.  If this stub capability still
> exists
> > +     * after we add the capabilities we want to expose, update the
> capability
> > +     * ID to zero.  Note that we cannot seed with the capability header
> being
> > +     * zero as this conflicts with definition of an absent capability
> chain
> > +     * and prevents capabilities beyond the head of the list from being
> added.
> > +     * By replacing the dummy capability ID with zero after walking the
> device
> > +     * chain, we also transparently mark extended capabilities as
> absent if
> > +     * no capabilities were added.  Note that the PCIe spec defines an
> absence
> > +     * of extended capabilities to be determined by a value of zero for
> the
> > +     * capability ID, version, AND next pointer.  A non-zero next
> pointer
> > +     * should be sufficient to indicate additional capabilities are
> present,
> > +     * which will occur if we call pcie_add_capability() below.  The
> entire
> > +     * first dword is emulated to support this.
> > +     *
> > +     * NB. The kernel side does similar masking, so be prepared that our
> > +     * view of the device may also contain a capability ID zero in the
> head
> > +     * of the chain.  Skip it for the same reason that we cannot seed
> the
> > +     * chain with a zero capability.
> >       */
> >      pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> >                   PCI_EXT_CAP(0xFFFF, 0, 0));
> > @@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >                                     PCI_EXT_CAP_NEXT_MASK);
> >
> >          switch (cap_id) {
> > +        case 0: /* kernel masked capability */
> >          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
> >          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function
> virtualization */
> >              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name,
> cap_id, next);
> >
>
> Reviewed-by: Peter Xu <address@hidden>
>
> Since this bug is originally reported by Jintack, maybe we can also
> add:
>
> Reported-by: Jintack Lim <address@hidden>
>
> Jintack, if you want to test it and provide your tested-by, it would
> be nice as well. ;)
>

I believe this patch is to fix the error I got before.
qemu-system-x86_64: hw/pci/pcie.c:686: pcie_add_capability: Assertion
`prev >= 0x100' failed.

If so,
Tested-by: Jintack Lim <address@hidden>


> Actually I just found that the bug still exist after Michael's fix (I
> thought it was fixed). So we definitely need this patch or equivalent.
> However, I would still slightly prefer removing the wrapping hack
> since after all we need to touch it (and I do feel like that's error
> prone...). So, Alex, do you like below one instead?
>
> --------8<----------
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 332f41d..6942c1d 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>       */
>      config = g_memdup(pdev->config, vdev->config_size);
>
> -    /*
> -     * Extended capabilities are chained with each pointing to the next,
> so we
> -     * can drop anything other than the head of the chain simply by
> modifying
> -     * the previous next pointer.  For the head of the chain, we can
> modify the
> -     * capability ID to something that cannot match a valid capability.
> ID
> -     * 0 is reserved for this since absence of capabilities is indicated
> by
> -     * 0 for the ID, version, AND next pointer.  However,
> pcie_add_capability()
> -     * uses ID 0 as reserved for list management and will incorrectly
> match and
> -     * assert if we attempt to pre-load the head of the chain with this
> ID.
> -     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> -     * part for identifying absence of capabilities in a root complex
> register
> -     * block.  If the ID still exists after adding capabilities, switch
> back to
> -     * zero.  We'll mark this entire first dword as emulated for this
> purpose.
> -     */
> -    pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> -                 PCI_EXT_CAP(0xFFFF, 0, 0));
> -    pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
> -    pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
> -
>      for (next = PCI_CONFIG_SPACE_SIZE; next;
>           next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
>          header = pci_get_long(config + next);
> @@ -1910,24 +1891,33 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>           */
>          size = vfio_ext_cap_max_size(config, next);
>
> -        /* Use emulated next pointer to allow dropping extended caps */
> -        pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
> -                                   PCI_EXT_CAP_NEXT_MASK);
> +        /* Use emulated header to allow dropping extended caps */
> +        pci_set_long(vdev->emulated_config_bits + next, 0xffffffffULL);
>
>          switch (cap_id) {
>          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
>          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function
> virtualization */
> +        case PCI_EXT_CAP_ID_VC:
> +            /*
> +             * For dropped capabilities, we keep their slot but
> +             * replace them with a header containing cap_id=0 &&
> +             * cap_ver=1. We do this reservation mostly to make sure
> +             * the head ecap (at offset 0x100) will always be there.
> +             * Anyway it won't hurt if we keep the rest of the dropped
> +             * ones as well.
> +             *
> +             * Here we use non-zero cap_ver because we want to "mark"
> +             * this ecap as "available" - from PCIe spec (chap 7.9.1),
> +             * it marked out that cap_id=cap_ver=next=0 means empty
> +             * ecap, and we really don't want it to be an "empty" slot
> +             * especially for the head ecap (we need a head, always!).
> +             */
> +            pcie_add_capability(pdev, 0, 1, next, size);
>              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id,
> next);
>              break;
>          default:
>              pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>          }
> -
> -    }
> -
> -    /* Cleanup chain head ID if necessary */
> -    if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
> -        pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
>      }
>
>      g_free(config);
> ------->8--------
>
> The new patch will keep all the dropped ecaps (so we may see more than
> one cap_id=0x0000 field), which I don't know whether would be a
> drawback. OTOH, it provides benefits like:
>
> - we can remove the wrapping hack, so the code is much readable and
>   less error prone imho
>
> - we can avoid using the assumption that 0xffff cap_id is reserved
>
> I can live with both patches though. :-)
>
> Thanks!
>
> -- peterx
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]