qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d


From: Knut Omang
Subject: Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset
Date: Tue, 19 Aug 2014 06:08:55 +0200

On Mon, 2014-08-18 at 20:50 +0200, Jan Kiszka wrote:
> On 2014-08-18 18:34, Knut Omang wrote:
> > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
> >> On 2014-08-16 10:45, Jan Kiszka wrote:
> >>> On 2014-08-16 09:54, Knut Omang wrote:
> >>>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> >>>>> Hi Knut,
> >>>>>
> >>>>> 2014-08-15 19:15 GMT+08:00 Knut Omang <address@hidden>:
> >>>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> >>>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> >>>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> >>>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) 
> >>>>>>>>>> emulation to q35
> >>>>>>>>>> chipset. The major job in these patches is to add support for 
> >>>>>>>>>> emulating Intel
> >>>>>>>>>> IOMMU according to the VT-d specification, including basic 
> >>>>>>>>>> responses to CSRs
> >>>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
> >>>>>>>>>> translations.
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>> Looks very good overall, I noted some coding style issues - I didn't
> >>>>>>>>> bother reporting each issue in every place where it appears - 
> >>>>>>>>> reported
> >>>>>>>>> each issue once only, so please find and fix all instances of each
> >>>>>>>>> issue.
> >>>>>>>>
> >>>>>>>> BTW, because I was in urgent need for virtual test environment for
> >>>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches:
> >>>>>>>>
> >>>>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> >>>>>>>>
> >>>>>>>> The approach likely needs further discussions and refinements but it
> >>>>>>>> already allows me to work on top with our hypervisor, and also Linux.
> >>>>>>>> You can see from the last commit that Le's work made it pretty easy 
> >>>>>>>> to
> >>>>>>>> build this on top.
> >>>>>>>
> >>>>>>> Le,
> >>>>>>>
> >>>>>>> I have tried Jan's branch with my device setup which consists of a
> >>>>>>> minimal q35 setup, an ioh3420 root port (specified as -device
> >>>>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
> >>>>>>> gives the following lscpi -t:
> >>>>>>>
> >>>>>>> -[0000:00]-+-00.0
> >>>>>>>            +-01.0
> >>>>>>>            +-02.0
> >>>>>>>            +-03.0-[01]----00.0
> >>>>>>>            +-04.0
> >>>>>>>            +-1f.0
> >>>>>>>            +-1f.2
> >>>>>>>            \-1f.3
> >>>>>>>
> >>>>>>> All seems to work beautifully (I see the ISA bridge happily receive
> >>>>>>> translations) until the first DMA from my device model (at 1:00.0)
> >>>>>>> arrives, at which point I get:
> >>>>>>>
> >>>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault 
> >>>>>>> addr fffa0000
> >>>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is 
> >>>>>>> clear
> >>>>>>>
> >>>>>>> I would have expected request device 01:00.0 for this.
> >>>>>>> It is not clear to me yet if this is a weakness of the implementation 
> >>>>>>> of
> >>>>>>> ioh3420 or the iommu. Just wanted to let you know right away in case 
> >>>>>>> you
> >>>>>>> can shed some light to it or it is an easy fix,
> >>>>>>>
> >>>>>>> The device uses pci_dma_rw with itself as device pointer.
> >>>>>>
> >>>>>> To verify my hypothesis: with this rude hack my device now works much
> >>>>>> better:
> >>>>>>
> >>>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace 
> >>>>>> *vtd_as,
> >>>>>> int bus_num, int devfn,
> >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>>>>>      } else {
> >>>>>>          ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> >>>>>> +        if (ret_fr)
> >>>>>> +            ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> >>>>>>          is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>>>>>          if (ret_fr) {
> >>>>>>              ret_fr = -ret_fr;
> >>>>>>
> >>>>>> Looking at how things look on hardware, multiple devices often receive
> >>>>>> overlapping DMA address ranges for different physical addresses.
> >>>>>>
> >>>>>> So if I understand the way this works, every requester ID would also
> >>>>>> need to have it's own unique VTDAddressSpace, as each pci
> >>>>>> device/function sees a unique DMA address space..
> >>>>>
> >>>>> ioh3420 is a pcie-to-pcie bridge, right? 
> >>>>
> >>>> Yes.
> >>>>
> >>>>> In my opinion, each pci-e
> >>>>> device behind the pcie-to-pcie bridge can be assigned individually.
> >>>>> For now I added the VT-d to q35 by just adding it to the root pci bus.
> >>>>> You can see here in q35.c:
> >>>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> >>>>> So if we add a pcie-to-pcie bridge, we may have to call the
> >>>>> pci_setup_iommu() for that new bus. I don't know where to hook into
> >>>>> this now. :) If you know the mechanism behind that, you can try to add
> >>>>> that for the new bus. (I will dive into this after the clean up.)
> >>>>> What do you think?
> >>>>
> >>>> Thanks for the quick answer, that helped a lot!
> >>>>
> >>>> Looking into the details here I realize it is slightly more complicated:
> >>>> secondary buses are enumerated after device instantiation, as part of
> >>>> the host PCI enumeration, so if I add a similar setup call in the bridge
> >>>> setup, it will be called for a new device long before it has received
> >>>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] )
> >>>>
> >>>> I agree that the lookup function for contexts needs to be as efficient
> >>>> as possible so the simple <busno,defvn> lookup key may be the best
> >>>> solution but then the address_spaces table cannot be populated with the
> >>>> secondary bus entries before it receives a nonzero != 255 bus number,
> >>>> eg. along the lines of this: 
> >>>>
> >>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> >>>> index 4becdc1..d9a8c23 100644
> >>>> --- a/hw/pci/pci_bridge.c
> >>>> +++ b/hw/pci/pci_bridge.c
> >>>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d,
> >>>>          pci_bridge_update_mappings(s);
> >>>>      }
> >>>>  
> >>>> +    if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) {
> >>>> +        int bus_num = pci_bus_num(&s->sec_bus);
> >>>> +        if (bus_num != 0xff && bus_num != 0x00)
> >>>> +            <handle bus number change>
> >>>> +    }
> >>>> +
> >>>>      newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
> >>>>      if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
> >>>>          /* Trigger hot reset on 0->1 transition. */
> >>>>
> >>>> but it is getting complicated...
> >>>> Thoughts?
> >>>
> >>> Point to the PCI bus from VTDAddressSpace instead of storing the bus_num
> >>> there?
> >>
> >> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of
> >> the IntelIOMMUState.
> > 
> > Thanks - that got me going - after some playing around with the data
> > structures I ended up with these patches (based on top of Jan's
> > vtd-intremap branch):
> 
> Are you depending on interrupt remapping? If not, my patches are a bit
> hacky and may cause their own issues if you are unlucky.
 
It does not depend directly but interprets a NULL PciDevice pointer as
the special bus number (0xff) for non-pci devices, so I have tried to
take heights for that - I can rebase if so desired, but I would like to
see the interrupt remapping emerge as well ;-)

Knut

> > https://github.com/knuto/qemu/tree/vtd_patches
> > 
> > In essence I ended up replacing the address_spaces[] array in
> > IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking
> > the VTDAddressSpace objects to serve vtd_context_device_invalidate, then
> > use pci_bus_num() whenever a bus number is requires, except for the 
> > "special" bus needed by the interrupt remapping code.
> > 
> > To achieve this, I had to change the signature of the pci_setup_iommu
> > function (the first commit).
> > 
> > With this I am now seeing translations for the device in the virtual
> > root port, but I think this should work equally well with other PCI
> > bridges.
> 
> Good to know that we are on the right path! Maybe this can be integrated
> in the next posting round so that we do not need any bridge exceptions
> due to incompatibility.
> 
> Jan
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]