qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm


From: Alex Williamson
Subject: Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
Date: Mon, 18 Aug 2014 06:53:58 -0600

On Mon, 2014-08-18 at 17:49 +0800, Zhang Haoyu wrote:
> >>> >> >> Hi, all
> >>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a 
> >>> >> >> problem,
> >>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only 
> >>> >> >> want to assign some VFs to one VM, and some other VFs to another VM,
> >>> >> >> so how to only unbind (part of) the VFs but PF?
> >>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of 
> >>> >> >> the devices which belong to one iommu_group?
> >>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I 
> >>> >> >> unbind the PF, its VFs also diappeared.
> >>> >> >> I think I misunderstand someting,
> >>> >> >> any advises?
> >>> >> >
> >>> >> >This occurs when the PF is installed behind components in the system
> >>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU 
> >>> >> >group
> >>> >> >contains both the PF and the VF because upstream transactions can be
> >>> >> >re-routed downstream by these non-ACS components before being 
> >>> >> >translated
> >>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and 
> >>> >> >kernel
> >>> >> >version and we might be able to give you some advise on how to work
> >>> >> >around the problem.  Thanks,
> >>> >> >
> >>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge 
> >>> >> (00:01.1),  
> >>> >> does 00:01.1 PCI bridge support ACS ?
> >>> >
> >>> >It does not and that's exactly the problem.  We must assume that the
> >>> >root port can redirect a transaction from a subordinate device back to
> >>> >another subordinate device without IOMMU translation when ACS support is
> >>> >not present.  If you had a device plugged in below 00:01.0, we'd also
> >>> >need to assume that non-IOMMU translated peer-to-peer between devices
> >>> >behind either function, 00:01.0 or 00:01.1, is possible.
> >>> >
> >>> >Intel has indicated that processor root ports for all Xeon class
> >>> >processors should support ACS and have verified isolation for PCH based
> >>> >root ports allowing us to support quirks in place of ACS support.  I'm
> >>> >not aware of any efforts at Intel to verify isolation capabilities of
> >>> >root ports on client processors.  They are however aware that lack of
> >>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
> >>> >see future products with ACS support.
> >>> >
> >>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
> >>> >ACS quirk, but it seems that your system has a PCIe switch below the
> >>> >root port.  If the PCIe switch downstream ports support ACS, then you
> >>> >may be able to move the 82599 to the empty slot at bus 07 to separate
> >>> >the VFs into different IOMMU groups.  Thanks,
> >>> >
> >>> Thanks, Alex,
> >>> how to tell whether a PCI bridge/deivce support ACS capability?
> >>> 
> >>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
> >>> # lspci -vvv -s 00:1c.0
> >>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family 
> >>> PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
> >>
> >>
> >>Ideally there would be capabilities for it, something like:
> >>
> >>Capabilities [xxx] Access Control Services...
> >>
> >>But, Intel failed to provide this, so we enable "effective" ACS
> >>capabilities via a quirk:
> >>
> >>drivers/pci/quirks.c:
> >>/*
> >> * Many Intel PCH root ports do provide ACS-like features to disable peer
> >> * transactions and validate bus numbers in requests, but do not provide an
> >> * actual PCIe ACS capability.  This is the list of device IDs known to fall
> >> * into that category as provided by Intel in Red Hat bugzilla 1037684.
> >> */
> >>static const u16 pci_quirk_intel_pch_acs_ids[] = {
> >>        /* Ibexpeak PCH */
> >>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
> >>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
> >>        /* Cougarpoint PCH */
> >>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
> >>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
> >>        /* Pantherpoint PCH */
> >>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
> >>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
> >>        /* Lynxpoint-H PCH */
> >>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
> >>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
> >>        /* Lynxpoint-LP PCH */
> >>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
> >>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
> >>        /* Wildcat PCH */
> >>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
> >>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
> >>        /* Patsburg (X79) PCH */
> >>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
> >>};
> >>
> >>Hopefully if you run 'lspci -n', you'll see your device ID listed among
> >>these.  We don't currently have any quirks for PCIe switches, so if your
> >>IOMMU group is still bigger than it should be, that may be the reason.
> >>Thanks,
> >>
> >Using device specific mechanisms to enable and verify ACS-like capability is 
> >okay,
> >but with regard to those devices which completely don't support ACS-like 
> >capabilities, 
> >what shall we do, how about applying the [PATCH] pci: Enable overrides for 
> >missing ACS capabilities,
> >and how to reduce the risk of data corruption and info leakage between VMs?
> >
> Any update compared with 
> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ?

You're welcome to use that patch, but you do so at your own risk.
Upstream is not willing to accept that patch because incorrectly
assuming isolation can lead to subtle issues which are difficult or
impossible to support.  This is why we only support ACS reported
isolation or vendor verified quirks.  If you're able to work with the
component vendor to verify isolation, we'd welcome more quirks in this
area.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]