[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMM
From: |
Shameerali Kolothum Thodi |
Subject: |
RE: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes |
Date: |
Wed, 11 Dec 2024 15:21:37 +0000 |
Hi Nicolin,
> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, December 10, 2024 8:48 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: eric.auger@redhat.com; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 4/5] hw/arm/virt-acpi-build: Build IORT with
> multiple SMMU nodes
>
> > And now I think I know(hopefully) the reason why it is not working with
> > smmuv3-nested case. I think the root cause is this commit here,
> >
> > (series: " cover-letter: Add HW accelerated nesting support for arm
> SMMUv3")
>
> > This changes the way address space is returned for the devices.
> >
> > static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int
> devfn)
> > {
> > SMMUState *s = opaque;
> > SMMUPciBus *sbus = smmu_get_sbus(s, bus);
> > SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn);
> >
> > /* Return the system as if the device uses stage-2 only */
> > if (s->nested && !sdev->s1_hwpt) {
> > return &sdev->as_sysmem;
> > } else {
> > return &sdev->as;
> > }
> > }
> >
> > If we have entries in the SMMUv3 idmap for bus:devfn, then I think we
> should
> > return IOMMU address space here. But the logic above returns sysmem
> > address space for anything other than vfio/iommufd devices.
> >
> > The hot add works when I hacked the logic to return IOMMU address
> space
> > for pcie root port devices.
> That is to bypass the "if (memory_region_is_iommu(section->mr))"
> in vfio_listener_region_add(), when the device gets initially
> attached to the default container.
Right.
> Once a device reaches to the pci_device_set_iommu_device() call,
> it should be attached to an IDENTIY/bypass proxy s1_hwpt, so the
> smmu_find_add_as() will return the iommu as.
Agree. The above situation you explained is perfectly fine with vfio-pci dev.
> So, the fact that your hack is working means the hotplug routine
> is likely missing a pci_device_set_iommu_device() call, IMHO, or
> probably it should do pci_device_iommu_address_space() after the
> device finishes pci_device_set_iommu_device() instead..
The problem is not with the hot added vfio-pci dev but with the
pcie-root-port device. When we hot add a vfio-pci to a root port,
Qemu will inject an interrupt for the Guest root port device and
that kick starts the vfio-pci device add process. This involves writing
to the MSI address the Guest kernel configures for the root port dev.
As per the current logic, the root port dev will have sysmem address
space and in IORT we have root port dev id in smmu idmap. This
will not work as Guest kernel configures a translated IOVA for MSI.
I think we have discussed this issue of returning different address
spaces before here[0]. But that was in a different context though.
The hack mentioned in [0] actually works for this case as well, where
we add an extra check to see the dev is vfio-pci or not. But I am not
sure that is the best way to handle this.
Another option is to exclude all the root port devices from IORT idmap.
But that looks not an ideal one to me as it actually sits behind an SMMUv3
in this case.
Please let me know if you have any ideas.
Thanks,
Shameer
[0]
https://lore.kernel.org/linux-iommu/02f3fbc5145d4449b3313eb802ecfa2c@huawei.com/