qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Date: Thu, 22 May 2014 09:16:17 +0200


> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <address@hidden>:
> 
>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>> 
>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>> interrupt as
>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>> problem for
>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>> reload or
>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of 
>>>>>>>>>>>> devices
>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>>>>>> reason
>>>>>>>>>>>> for that.
>>>>>>>>>>>> 
>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>> 
>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first
>>>>>>>>>>>> IRQ
>>>>>>>>>>>> number
>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>> need to
>>>>>>>>>>>> store
>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>>>>>> what
>>>>>>>>>>>> type
>>>>>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>>>>>> error if
>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable
>>>>>>>>>>>> MSI
>>>>>>>>>>>> which
>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>> 
>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>>>>>> per PHB,
>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>> 
>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>> which was
>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>> 
>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>> 
>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>>>>>>>>> ---
>>>>>>>>>>>> 
>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>>>>>> remove
>>>>>>>>>>>> msi/msix
>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>> MSIX and
>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>> limitation,
>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>> simplify
>>>>>>>>>> code by introducing one :)
>>>>>>>>>> 
>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>>>>>> but
>>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>>>>> 
>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>>>>>> already
>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>> says MSI
>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>> rather not impose one.
>>>>>>>>> 
>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>> the same time?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Alex
>>>>>>>> No, and the PCI spec says:
>>>>>>>>      A function is permitted to implement both MSI and MSI-X, but
>>>>>>>> system
>>>>>>>>      software is
>>>>>>>>      prohibited from enabling both at the same time. If system software
>>>>>>>>      enables both at the same time, the result is undefined.
>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>> Heh. This solves just half of the problem - I still have to keep track of
>>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>>>>>> was hoping I can store such flag somewhere in a device PCI config space
>>>>>> but
>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>> made.
>>>>>> And I cannot rely on address/data fields much as the guest can change 
>>>>>> them
>>>>>> (I already use them to store IRQ numbers and btw it is missing checks 
>>>>>> when
>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>> 
>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>>>>>> and on "disable" check if it is not zero, then configuration took place,
>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any 
>>>>>> change
>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>> cases), I compare .data with what ICS can possibly have and either reject
>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>> stupid guest. Would that be acceptable?
>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>> registers itself? Then you don't need to do sanity checks.
>>>> 
>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves
>>>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX
>>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
>>>> from cached values when guest is trying to restore it with what it thinks
>>>> is actual MSIX data (it is virtualized because of x86). But there is cache
>>> 
>>> We already have a cache because we don't access the real PCI registers with
>>> msi_set_message(), no?
>> 
>> 
>> For emulated devices there is no cache. And in any case the guest is
>> allowed to write to it... Who knows what AIX does? I do not.
> 
> 
> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but
> how to migrate such thing? Temporary cache somewhere and then unpack it? Or
> use old style migration callbacks?

Could you try to introduce a new vmstate type that just serializes and 
deserializes hash tables? Maybe there is already a serialization function for 
it in glib?


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]