qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d


From: Knut Omang
Subject: Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset
Date: Tue, 19 Aug 2014 06:18:24 +0200

On Tue, 2014-08-19 at 06:08 +0200, Knut Omang wrote:
> On Mon, 2014-08-18 at 20:50 +0200, Jan Kiszka wrote:
> > On 2014-08-18 18:34, Knut Omang wrote:
> > > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
> > >> On 2014-08-16 10:45, Jan Kiszka wrote:
> > >>> On 2014-08-16 09:54, Knut Omang wrote:
> > >>>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> > >>>>> Hi Knut,
> > >>>>>
> > >>>>> 2014-08-15 19:15 GMT+08:00 Knut Omang <address@hidden>:
> > >>>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> > >>>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> > >>>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> > >>>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) 
> > >>>>>>>>>> emulation to q35
> > >>>>>>>>>> chipset. The major job in these patches is to add support for 
> > >>>>>>>>>> emulating Intel
> > >>>>>>>>>> IOMMU according to the VT-d specification, including basic 
> > >>>>>>>>>> responses to CSRs
> > >>>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory 
> > >>>>>>>>>> address
> > >>>>>>>>>> translations.
> > >>>>>>>>>
> > >>>>>>>>> Thanks!
> > >>>>>>>>> Looks very good overall, I noted some coding style issues - I 
> > >>>>>>>>> didn't
> > >>>>>>>>> bother reporting each issue in every place where it appears - 
> > >>>>>>>>> reported
> > >>>>>>>>> each issue once only, so please find and fix all instances of each
> > >>>>>>>>> issue.
> > >>>>>>>>
> > >>>>>>>> BTW, because I was in urgent need for virtual test environment for
> > >>>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
> > >>>>>>>>
> > >>>>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> > >>>>>>>>
> > >>>>>>>> The approach likely needs further discussions and refinements but 
> > >>>>>>>> it
> > >>>>>>>> already allows me to work on top with our hypervisor, and also 
> > >>>>>>>> Linux.
> > >>>>>>>> You can see from the last commit that Le's work made it pretty 
> > >>>>>>>> easy to
> > >>>>>>>> build this on top.
> > >>>>>>>
> > >>>>>>> Le,
> > >>>>>>>
> > >>>>>>> I have tried Jan's branch with my device setup which consists of a
> > >>>>>>> minimal q35 setup, an ioh3420 root port (specified as -device
> > >>>>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, 
> > >>>>>>> which
> > >>>>>>> gives the following lscpi -t:
> > >>>>>>>
> > >>>>>>> -[0000:00]-+-00.0
> > >>>>>>>            +-01.0
> > >>>>>>>            +-02.0
> > >>>>>>>            +-03.0-[01]----00.0
> > >>>>>>>            +-04.0
> > >>>>>>>            +-1f.0
> > >>>>>>>            +-1f.2
> > >>>>>>>            \-1f.3
> > >>>>>>>
> > >>>>>>> All seems to work beautifully (I see the ISA bridge happily receive
> > >>>>>>> translations) until the first DMA from my device model (at 1:00.0)
> > >>>>>>> arrives, at which point I get:
> > >>>>>>>
> > >>>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] 
> > >>>>>>> fault addr fffa0000
> > >>>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry 
> > >>>>>>> is clear
> > >>>>>>>
> > >>>>>>> I would have expected request device 01:00.0 for this.
> > >>>>>>> It is not clear to me yet if this is a weakness of the 
> > >>>>>>> implementation of
> > >>>>>>> ioh3420 or the iommu. Just wanted to let you know right away in 
> > >>>>>>> case you
> > >>>>>>> can shed some light to it or it is an easy fix,
> > >>>>>>>
> > >>>>>>> The device uses pci_dma_rw with itself as device pointer.
> > >>>>>>
> > >>>>>> To verify my hypothesis: with this rude hack my device now works much
> > >>>>>> better:
> > >>>>>>
> > >>>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace 
> > >>>>>> *vtd_as,
> > >>>>>> int bus_num, int devfn,
> > >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> > >>>>>>      } else {
> > >>>>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> > >>>>>> +        if (ret_fr)
> > >>>>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> > >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> > >>>>>>          if (ret_fr) {
> > >>>>>>              ret_fr = -ret_fr;
> > >>>>>>
> > >>>>>> Looking at how things look on hardware, multiple devices often 
> > >>>>>> receive
> > >>>>>> overlapping DMA address ranges for different physical addresses.
> > >>>>>>
> > >>>>>> So if I understand the way this works, every requester ID would also
> > >>>>>> need to have it's own unique VTDAddressSpace, as each pci
> > >>>>>> device/function sees a unique DMA address space..
> > >>>>>
> > >>>>> ioh3420 is a pcie-to-pcie bridge, right? 
> > >>>>
> > >>>> Yes.
> > >>>>
> > >>>>> In my opinion, each pci-e
> > >>>>> device behind the pcie-to-pcie bridge can be assigned individually.
> > >>>>> For now I added the VT-d to q35 by just adding it to the root pci bus.
> > >>>>> You can see here in q35.c:
> > >>>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> > >>>>> So if we add a pcie-to-pcie bridge, we may have to call the
> > >>>>> pci_setup_iommu() for that new bus. I don't know where to hook into
> > >>>>> this now. :) If you know the mechanism behind that, you can try to add
> > >>>>> that for the new bus. (I will dive into this after the clean up.)
> > >>>>> What do you think?
> > >>>>
> > >>>> Thanks for the quick answer, that helped a lot!
> > >>>>
> > >>>> Looking into the details here I realize it is slightly more 
> > >>>> complicated:
> > >>>> secondary buses are enumerated after device instantiation, as part of
> > >>>> the host PCI enumeration, so if I add a similar setup call in the 
> > >>>> bridge
> > >>>> setup, it will be called for a new device long before it has received
> > >>>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
> > >>>>
> > >>>> I agree that the lookup function for contexts needs to be as efficient
> > >>>> as possible so the simple <busno,defvn> lookup key may be the best
> > >>>> solution but then the address_spaces table cannot be populated with the
> > >>>> secondary bus entries before it receives a nonzero != 255 bus number,
> > >>>> eg. along the lines of this: 
> > >>>>
> > >>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > >>>> index 4becdc1..d9a8c23 100644
> > >>>> --- a/hw/pci/pci_bridge.c
> > >>>> +++ b/hw/pci/pci_bridge.c
> > >>>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
> > >>>>          pci_bridge_update_mappings(s);
> > >>>>      }
> > >>>>  
> > >>>> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
> > >>>> +        int bus_num = pci_bus_num(&s->sec_bus);
> > >>>> +        if (bus_num != 0xff && bus_num != 0x00)
> > >>>> +            <handle bus number change>
> > >>>> +    }
> > >>>> +
> > >>>>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
> > >>>>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
> > >>>>          /* Trigger hot reset on 0->1 transition. */
> > >>>>
> > >>>> but it is getting complicated...
> > >>>> Thoughts?
> > >>>
> > >>> Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
> > >>> there?
> > >>
> > >> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of
> > >> the IntelIOMMUState.
> > > 
> > > Thanks - that got me going - after some playing around with the data
> > > structures I ended up with these patches (based on top of Jan's
> > > vtd-intremap branch):
> > 
> > Are you depending on interrupt remapping? If not, my patches are a bit
> > hacky and may cause their own issues if you are unlucky.
>  
> It does not depend directly but interprets a NULL PciDevice pointer as
> the special bus number (0xff) for non-pci devices, so I have tried to
> take heights for that - I can rebase if so desired, but I would like to
> see the interrupt remapping emerge as well ;-)
> 
> Knut
> 
> > > https://github.com/knuto/qemu/tree/vtd_patches
> > > 
> > > In essence I ended up replacing the address_spaces[] array in
> > > IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking
> > > the VTDAddressSpace objects to serve vtd_context_device_invalidate, then
> > > use pci_bus_num() whenever a bus number is requires, except for the 
> > > "special" bus needed by the interrupt remapping code.
> > > 
> > > To achieve this, I had to change the signature of the pci_setup_iommu
> > > function (the first commit).
> > > 
> > > With this I am now seeing translations for the device in the virtual
> > > root port, but I think this should work equally well with other PCI
> > > bridges.
> > 
> > Good to know that we are on the right path! Maybe this can be integrated
> > in the next posting round so that we do not need any bridge exceptions
> > due to incompatibility.
> >
> > Jan

Lets aim at that - as far as I understand the Linux iommu driver, the
kernel will try to setup mappings for everything (using identity
mappings for some devices), so having exceptions on the Qemu side would
probably have to be a hack.

Knut





reply via email to

[Prev in Thread] Current Thread [Next in Thread]