[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 03/18] pci: isolated address space for PCI bus
|
From: |
Jag Raman |
|
Subject: |
Re: [PATCH v5 03/18] pci: isolated address space for PCI bus |
|
Date: |
Fri, 11 Feb 2022 00:54:04 +0000 |
> On Feb 10, 2022, at 7:26 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Feb 10, 2022 at 04:49:33PM -0700, Alex Williamson wrote:
>> On Thu, 10 Feb 2022 18:28:56 -0500
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>
>>> On Thu, Feb 10, 2022 at 04:17:34PM -0700, Alex Williamson wrote:
>>>> On Thu, 10 Feb 2022 22:23:01 +0000
>>>> Jag Raman <jag.raman@oracle.com> wrote:
>>>>
>>>>>> On Feb 10, 2022, at 3:02 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>
>>>>>> On Thu, Feb 10, 2022 at 12:08:27AM +0000, Jag Raman wrote:
>>>>>>>
>>>>>>> Thanks for the explanation, Alex. Thanks to everyone else in the thread
>>>>>>> who
>>>>>>> helped to clarify this problem.
>>>>>>>
>>>>>>> We have implemented the memory isolation based on the discussion in the
>>>>>>> thread. We will send the patches out shortly.
>>>>>>>
>>>>>>> Devices such as “name" and “e1000” worked fine. But I’d like to note
>>>>>>> that
>>>>>>> the LSI device (TYPE_LSI53C895A) had some problems - it doesn’t seem
>>>>>>> to be IOMMU aware. In LSI’s case, the kernel driver is asking the
>>>>>>> device to
>>>>>>> read instructions from the CPU VA (lsi_execute_script() ->
>>>>>>> read_dword()),
>>>>>>> which is forbidden when IOMMU is enabled. Specifically, the driver is
>>>>>>> asking
>>>>>>> the device to access other BAR regions by using the BAR address
>>>>>>> programmed
>>>>>>> in the PCI config space. This happens even without vfio-user patches.
>>>>>>> For example,
>>>>>>> we could enable IOMMU using “-device intel-iommu” QEMU option and also
>>>>>>> adding the following to the kernel command-line: “intel_iommu=on
>>>>>>> iommu=nopt”.
>>>>>>> In this case, we could see an IOMMU fault.
>>>>>>
>>>>>> So, device accessing its own BAR is different. Basically, these
>>>>>> transactions never go on the bus at all, never mind get to the IOMMU.
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> In LSI case, I did notice that it went to the IOMMU. The device is
>>>>> reading the BAR
>>>>> address as if it was a DMA address.
>>>>>
>>>>>> I think it's just used as a handle to address internal device memory.
>>>>>> This kind of trick is not universal, but not terribly unusual.
>>>>>>
>>>>>>
>>>>>>> Unfortunately, we started off our project with the LSI device. So that
>>>>>>> lead to all the
>>>>>>> confusion about what is expected at the server end in-terms of
>>>>>>> vectoring/address-translation. It gave an impression as if the request
>>>>>>> was still on
>>>>>>> the CPU side of the PCI root complex, but the actual problem was with
>>>>>>> the
>>>>>>> device driver itself.
>>>>>>>
>>>>>>> I’m wondering how to deal with this problem. Would it be OK if we
>>>>>>> mapped the
>>>>>>> device’s BAR into the IOVA, at the same CPU VA programmed in the BAR
>>>>>>> registers?
>>>>>>> This would help devices such as LSI to circumvent this problem. One
>>>>>>> problem
>>>>>>> with this approach is that it has the potential to collide with another
>>>>>>> legitimate
>>>>>>> IOVA address. Kindly share your thought on this.
>>>>>>>
>>>>>>> Thank you!
>>>>>>
>>>>>> I am not 100% sure what do you plan to do but it sounds fine since even
>>>>>> if it collides, with traditional PCI device must never initiate cycles
>>>>>>
>>>>>
>>>>> OK sounds good, I’ll create a mapping of the device BARs in the IOVA.
>>>>
>>>> I don't think this is correct. Look for instance at ACPI _TRA support
>>>> where a system can specify a translation offset such that, for example,
>>>> a CPU access to a device is required to add the provided offset to the
>>>> bus address of the device. A system using this could have multiple
>>>> root bridges, where each is given the same, overlapping MMIO aperture.
>>>>> From the processor perspective, each MMIO range is unique and possibly
>>>> none of those devices have a zero _TRA, there could be system memory at
>>>> the equivalent flat memory address.
>>>
>>> I am guessing there are reasons to have these in acpi besides firmware
>>> vendors wanting to find corner cases in device implementations though
>>> :). E.g. it's possible something else is tweaking DMA in similar ways. I
>>> can't say for sure and I wonder why do we care as long as QEMU does not
>>> have _TRA.
>>
>> How many complaints do we get about running out of I/O port space on
>> q35 because we allow an arbitrary number of root ports? What if we
>> used _TRA to provide the full I/O port range per root port? 32-bit
>> MMIO could be duplicated as well.
>
> It's an interesting idea. To clarify what I said, I suspect some devices
> are broken in presence of translating bridges unless DMA
> is also translated to match.
>
> I agree it's a mess though, in that some devices when given their own
> BAR to DMA to will probably just satisfy the access from internal
> memory, while others will ignore that and send it up as DMA
> and both types are probably out there in the field.
>
>
>>>> So if the transaction actually hits this bus, which I think is what
>>>> making use of the device AddressSpace implies, I don't think it can
>>>> assume that it's simply reflected back at itself. Conventional PCI and
>>>> PCI Express may be software compatible, but there's a reason we don't
>>>> see IOMMUs that provide both translation and isolation in conventional
>>>> topologies.
>>>>
>>>> Is this more a bug in the LSI device emulation model? For instance in
>>>> vfio-pci, if I want to access an offset into a BAR from within QEMU, I
>>>> don't care what address is programmed into that BAR, I perform an
>>>> access relative to the vfio file descriptor region representing that
>>>> BAR space. I'd expect that any viable device emulation model does the
>>>> same, an access to device memory uses an offset from an internal
>>>> resource, irrespective of the BAR address.
>>>
>>> However, using BAR seems like a reasonable shortcut allowing
>>> device to use the same 64 bit address to refer to system
>>> and device RAM interchangeably.
>>
>> A shortcut that breaks when an IOMMU is involved.
>
> Maybe. But if that's how hardware behaves, we have little choice but
> emulate it.
I was wondering if we could map the BARs into the IOVA for a limited set of
devices - the ones which are designed before IOMMU such as lsi53c895a.
This would ensure that we follow the spec to the best without breaking
existing devices?
--
Jag
>
>>>> It would seem strange if the driver is actually programming the device
>>>> to DMA to itself and if that's actually happening, I'd wonder if this
>>>> driver is actually compatible with an IOMMU on bare metal.
>>>
>>> You know, it's hardware after all. As long as things work vendors
>>> happily ship the device. They don't really worry about theoretical issues
>>> ;).
>>
>> We're talking about a 32-bit conventional PCI device from the previous
>> century. IOMMUs are no longer theoretical, but not all drivers have
>> kept up. It's maybe not the best choice as the de facto standard
>> device, imo. Thanks,
>>
>> Alex
>
> More importantly lots devices most likely don't support arbitrary
> configurations and break if you try to create something that matches the
> spec but never or even rarely occurs on bare metal.
>
> --
> MST
>
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, (continued)
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Stefan Hajnoczi, 2022/02/02
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Jag Raman, 2022/02/09
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Michael S. Tsirkin, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Jag Raman, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Michael S. Tsirkin, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Jag Raman, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Alex Williamson, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Michael S. Tsirkin, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Alex Williamson, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Michael S. Tsirkin, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus,
Jag Raman <=
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Jag Raman, 2022/02/10
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Peter Maydell, 2022/02/02
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Michael S. Tsirkin, 2022/02/02
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Alex Williamson, 2022/02/02
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Michael S. Tsirkin, 2022/02/02
- Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Alex Williamson, 2022/02/02
Re: [PATCH v5 03/18] pci: isolated address space for PCI bus, Dr. David Alan Gilbert, 2022/02/01