qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] PCI iommu issues


From: Jan Kiszka
Subject: Re: [Qemu-devel] PCI iommu issues
Date: Fri, 30 Jan 2015 08:40:51 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

Adding Knut to CC as he particularly looked into and fixed the bridging
issues or the vtd emulation. I will have to refresh my memories first.

Jan

On 2015-01-30 06:45, Benjamin Herrenschmidt wrote:
> Hi folks !
> 
> 
> I've looked at the intel iommu code to try to figure out how to properly
> implement a Power8 "native" iommu model and encountered some issues.
> 
> Today "pseries" ppc machine is paravirtualized and so uses a pretty
> simplistic iommu model that essentially has one address space per host
> bridge.
> 
> However, the real HW model I'm working on is closer to Intel in that we
> have various tables walked by HW that match an originator RID to what
> we called a "PE" (Partitionable Endpoint) to which corresponds an
> address space.
> 
> So on a given domain, individual functions can have different iommu
> address spaces & translation structures, or group of devices etc...
> which can all be configured dynamically by the guest OS. This is similar
> as far as I understand things to the Intel model though the details of
> the implementation are very different.
> 
> So I implemented something along the lines of what you guys did for q35
> and intel_iommu, and quickly discovered that it doesn't work, which
> makes me wonder whether the intel stuff in qemu actually works, or
> rather, does it work when adding bridges & switches into the picture.
> 
> I basically have two problems but they are somewhat related. Firstly
> the way the intel code works is that it creates lazily context
> structures that contain the address space, and get associated with
> devices when pci_device_iommu_address_space() is called which in
> turns calls the bridge iommu_fn which performs the association.
> 
> The first problem is that the association is done based on bus/dev/fn
> of the device... at a time where bus numbers have not been assigned yet.
> 
> In fact, the bus numbers are assigned dynamically by SW, the BIOS
> typically, but the OS can renumber things and it's bogus to assume thus
> that the RID (bus/dev/fn) of a PCI device/function is fixed. However
> that's exactly what the code does as it calls
> pci_device_iommu_address_space() once at device instanciation time in
> qemu, even before SW had a chance to assign anything.
> 
> So as far as I can tell, things will work as long as you are on bus 0
> and there is no bridge, otherwise, it's broken by design, unless I'm
> missing something...
> 
> I've hacked that locally in my code by using the PCIBus * pointer
> instead of the bus number to match the device to the iommu context.
> 
> The second problem is that pci_device_iommu_address_space(), as it walks
> up the hierarchy to find the iommu_fn, drops the original device
> information. That means that if a device is below a switch or a p2p
> bridge of some sort, once you reach the host bridge top level bus, all
> we know is the bus & devfn of the last p2p entity along the path, we
> lose the original bus & devfn information.
> 
> This is incorrect for that sort of iommu, at least while in the PCIe
> domain, as the original RID is carried along with DMA transactions and i
> thus needed to properly associate the device/function with a context.
> 
> One fix could be to populate the iommu_fn of every bus down the food
> chain but that's fairly cumbersome... unless we make the PCI bridges by
> default "inherit" from their parent iommu_fn.
> 
> Here, I've done a hack locally to keep the original device information
> in pci_device_iommu_address_space() but it's not a proper way to do it
> either, ultimately, each bridge need to be able to tell whether it
> properly forwards the RID information or not, so the bridge itself need
> to have some attribute to control that. Typically a PCIe switch or root
> complex will always forward the full RID... while most PCI-E -> PCI-X
> bridges are busted in that regard. Worse, some bridges forward *some*
> bits (partial RID) which is even more broken but I don't know if we can
> or even care about simulating it. Thankfully most PCI-X or PCI bridges
> will behave properly and make it look like all DMAs are coming from the
> bridge itself.
> 
> What do you guys reckon is the right approach for both problems ?
> 
> Cheers,
> Ben.
> 
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



reply via email to

[Prev in Thread] Current Thread [Next in Thread]