[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC/RFT PATCH 5/5] vfio/pci: Allow relocating MSI-X MM
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-devel] [RFC/RFT PATCH 5/5] vfio/pci: Allow relocating MSI-X MMIO |
Date: |
Tue, 19 Dec 2017 19:28:40 +1100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
On 19/12/17 17:56, Alex Williamson wrote:
> On Tue, 19 Dec 2017 17:02:59 +1100
> Alexey Kardashevskiy <address@hidden> wrote:
>
>> On 19/12/17 14:40, Alex Williamson wrote:
>>> On Tue, 19 Dec 2017 14:07:13 +1100
>>> Alexey Kardashevskiy <address@hidden> wrote:
>>>
>>>> On 18/12/17 16:02, Alex Williamson wrote:
>>>>> With recently proposed kernel side vfio-pci changes, the MSI-X vector
>>>>> table area can be mmap'd from userspace, allowing direct access to
>>>>> non-MSI-X registers within the host page size of this area. However,
>>>>> we only get that direct access if QEMU isn't also emulating MSI-X
>>>>> within that same page. For x86/64 host, the system page size is 4K
>>>>> and the PCI spec recommends a minimum of 4K to 8K alignment to
>>>>> separate MSI-X from non-MSI-X registers, therefore only devices which
>>>>> don't honor this recommendation would see any improvement from this
>>>>> option. The real targets for this feature are hosts where the page
>>>>> size exceeds the PCI spec recommended alignment, such as ARM64 systems
>>>>> with 64K pages.
>>>>>
>>>>> This new x-msix-relocation option accepts the following options:
>>>>>
>>>>> off: Disable MSI-X relocation, use native device config (default)
>>>>> auto: Automaically relocate MSI-X MMIO to another BAR or offset
>>>>> based on minimum additional MMIO requirement
>>>>> bar0..bar5: Specify the target BAR, which will either be extended
>>>>> if the BAR exists or added if the BAR slot is available.
>>>>>
>>>>> Signed-off-by: Alex Williamson <address@hidden>
>>>>> ---
>>>>> hw/vfio/pci.c | 102
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> hw/vfio/pci.h | 1
>>>>> hw/vfio/trace-events | 2 +
>>>>> 3 files changed, 104 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>> index c383b842da20..b4426abf297a 100644
>>>>> --- a/hw/vfio/pci.c
>>>>> +++ b/hw/vfio/pci.c
>>>>> @@ -1352,6 +1352,101 @@ static void
>>>>> vfio_pci_fixup_msix_region(VFIOPCIDevice *vdev)
>>>>> }
>>>>> }
>>>>>
>>>>> +static void vfio_pci_relocate_msix(VFIOPCIDevice *vdev)
>>>>> +{
>>>>> + int target_bar = -1;
>>>>> + size_t msix_sz;
>>>>> +
>>>>> + if (!vdev->msix || vdev->msix_relo == OFF_AUTOPCIBAR_OFF) {
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + /* The actual minimum size of MSI-X structures */
>>>>> + msix_sz = (vdev->msix->entries * PCI_MSIX_ENTRY_SIZE) +
>>>>> + (QEMU_ALIGN_UP(vdev->msix->entries, 64) / 8);
>>>>> + /* Round up to host pages, we don't want to share a page */
>>>>> + msix_sz = REAL_HOST_PAGE_ALIGN(msix_sz);
>>>>> + /* PCI BARs must be a power of 2 */
>>>>> + msix_sz = pow2ceil(msix_sz);
>>>>> +
>>>>> + /* Auto: pick the BAR that incurs the least additional MMIO space */
>>>>> + if (vdev->msix_relo == OFF_AUTOPCIBAR_AUTO) {
>>>>> + int i;
>>>>> + size_t best = UINT64_MAX;
>>>>> +
>>>>> + for (i = 0; i < PCI_ROM_SLOT; i++) {
>>>>
>>>>
>>>> I belieive that going from the other end is safer approach for "auto",
>>>> especially after discovering how mpt3sas works. Or you could add
>>>> "autoreverse" switch...
>>>
>>> Or is extending the smallest BAR really a safer option? I wonder how
>>> many drivers go through and fill fixed sized arrays with BAR info,
>>> expecting only the device implemented number of BARs. Maybe they
>>> wouldn't notice if the BAR was simply bigger than expected. On the
>>> other hand there are probably drivers dumb enough to index registers
>>> from the end for the BAR as well. I don't think there exists an
>>> auto algorithm that will fit every device, but a higher hit rate than
>>> we have so far would be nice.
>>
>> Everything is possible :(
>>
>> I do not know if there are many users for this relocation though. So far
>> only one device has the problem (in 5 years or so) and it is fixed by
>> moving msix to bar5, I'd suggest start with this for now.
>
> Interesting, I would have thought it to be more common.
Just to clarify - one device with performance issue because of msix
emulation, non-64k-aligned msix data is not that unusual.
>
>> In general, I think we still need a way to simply disable that msix_table
>> region anyway if we find a device driver which uses all BARs, does not
>> tolerate changes to the default set of BARs, etc.
>
> Only SPAPR can do that. In fact, I'm somewhat surprised by your
> interest in this series as I positioned it as a way for other
> platforms, which require interaction with MSI-X MMIO space for
> programming interrupts.
Well, it moves the guest-visible msix section away from the BAR causing
performance issues so I figured it might work for SPAPR eventually :)
>>> We could also implement MemoryRegionOps
>>> for the base BAR with some error reporting if it gets called. That
>>> might make the problem more obvious than unassigned_mem_ops silently
>>> eating those accesses.
>>
>> Makes sense.
>>
>>
>>>
>>>>> + size_t size;
>>>>> +
>>>>> + if (vdev->bars[i].ioport) {
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> + /* MSI-X MMIO must reside within first 32bit offset of BAR */
>>>>> + if (vdev->bars[i].size > (UINT32_MAX / 2))
>>>
>>> Adding a '|| !vdev->bars[i].size' here would make auto only extend BARs.
>>>
>>> NB, the existing test here needs a bit of work too, 32bit BARs max out
>>> at 2G not 4G, so maybe we need separate tests here. >1G for 32bit
>>> BARs, >2G for 64bit BARs. Hmm, do we have the option of promoting
>>> 32bit BARs to 64bit? It's all virtual addresses anyway, right. We're
>>> in real trouble if were extending BARs where this is an issue though.
>>
>> until you get a driver like this :)
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/rapidio/devices/tsi721.c?h=v4.15-rc4#n2782
>
> Right, a diametric opposite of the SAS driver, verifying all the
> attributes it can of specific BARs rather than assuming the first BAR
> it finds must be the one to use. Is it even worthwhile to try to have
> any automatic selection? I suppose this driver is another point
> towards a reverse search rather than extended BAR. Thanks,
Well, guessing like this may fail occasionally and simply allowing MSIX
mapping won't fail on SPAPR, I do not really know if it is going to be very
useful anywhere else than just SPAPR.
And I guess if we go the automatic selection path, than extending a BAR
does not have much benefit over using the last BAR because it seems quite
unlikely that a device 1) does not have any BARs unused and 2) none of BARs
is MSIX-only but if this is a case, I am not sure what guess would be safer.
I looked nearby, for example:
001e:80:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719
Gigabit Ethernet PCIe (rev 01)
Region 0: Memory at 3fc2c0250000 (64-bit, prefetchable) [size=64K]
Region 2: Memory at 3fc2c0240000 (64-bit, prefetchable) [size=64K]
Region 4: Memory at 3fc2c0230000 (64-bit, prefetchable) [size=64K]
Capabilities: [a0] MSI-X: Enable- Count=17 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000120
It is fully packed and it *seems* that BAR4 is MSIX only but who knows why
it is 64K - can be anything...
This one looks more convincing but still no guarantee:
0001:09:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0
xHCI Host Controller (rev 02)
Region 0: Memory at 3fe080800000 (64-bit, non-prefetchable) [size=64K]
Region 2: Memory at 3fe080810000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c0] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00001000
A funny thing - my thinkpad x1 does not have a single msix-capable device,
many are MSI and "Express (v2) Endpoint, MSI 00". Hmmm. Xeon and POWER8
boxes do have MSIX.
--
Alexey
- [Qemu-devel] [RFC/RFT PATCH 5/5] vfio/pci: Allow relocating MSI-X MMIO, (continued)
Re: [Qemu-devel] [RFC/RFT PATCH 5/5] vfio/pci: Allow relocating MSI-X MMIO, Alexey Kardashevskiy, 2017/12/18
Re: [Qemu-devel] [RFC/RFT PATCH 0/5] vfio/pci: MSI-X MMIO relocation, no-reply, 2017/12/18