qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Date: Fri, 23 May 2014 00:25:50 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 05/22/2014 08:57 PM, Alexander Graf wrote:
> 
> On 22.05.14 12:53, Alexey Kardashevskiy wrote:
>> On 05/22/2014 05:16 PM, Alexander Graf wrote:>
>>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <address@hidden>:
>>>>
>>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>>>>>
>>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>>>>> interrupt as
>>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>>>>> problem for
>>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>>>>> reload or
>>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of
>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no
>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the
>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>> IRQ
>>>>>>>>>>>>>>> number
>>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags
>>>>>>>>>>>>>>> telling
>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>> of interrupt for which device it has configured in order to
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>> error if
>>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to
>>>>>>>>>>>>>>> disable
>>>>>>>>>>>>>>> MSI
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled
>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>> per PHB,
>>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>>>>> which was
>>>>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>> msi/msix
>>>>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>>>>> MSIX and
>>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>>>>> limitation,
>>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your
>>>>>>>>>>>>>> implementation?
>>>>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>>>>> simplify
>>>>>>>>>>>>> code by introducing one :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX
>>>>>>>>>>>>> enabled
>>>>>>>>>>>>> but
>>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and
>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when
>>>>>>>>>>>>> MSI is
>>>>>>>>>>>>> already
>>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>>>>> says MSI
>>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>>>>> rather not impose one.
>>>>>>>>>>>>
>>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>>>>> the same time?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Alex
>>>>>>>>>>> No, and the PCI spec says:
>>>>>>>>>>>       A function is permitted to implement both MSI and MSI-X, but
>>>>>>>>>>> system
>>>>>>>>>>>       software is
>>>>>>>>>>>       prohibited from enabling both at the same time. If system
>>>>>>>>>>> software
>>>>>>>>>>>       enables both at the same time, the result is undefined.
>>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>>>>> Heh. This solves just half of the problem - I still have to keep
>>>>>>>>> track of
>>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi
>>>>>>>>> interface. I
>>>>>>>>> was hoping I can store such flag somewhere in a device PCI config
>>>>>>>>> space
>>>>>>>>> but
>>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>>>>> made.
>>>>>>>>> And I cannot rely on address/data fields much as the guest can
>>>>>>>>> change them
>>>>>>>>> (I already use them to store IRQ numbers and btw it is missing
>>>>>>>>> checks when
>>>>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>>>>>
>>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI
>>>>>>>>> config space
>>>>>>>>> and on "disable" check if it is not zero, then configuration took
>>>>>>>>> place,
>>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did
>>>>>>>>> any change
>>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>>>>> cases), I compare .data with what ICS can possibly have and either
>>>>>>>>> reject
>>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>>>>> stupid guest. Would that be acceptable?
>>>>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>>>>> registers itself? Then you don't need to do sanity checks.
>>>>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it
>>>>>>> saves
>>>>>>> MSIX BAR content, does reboot via some backdoor interface and
>>>>>>> restores MSIX
>>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring
>>>>>>> MSIX data
>>>>>>> from cached values when guest is trying to restore it with what it
>>>>>>> thinks
>>>>>>> is actual MSIX data (it is virtualized because of x86). But there is
>>>>>>> cache
>>>>>> We already have a cache because we don't access the real PCI
>>>>>> registers with
>>>>>> msi_set_message(), no?
>>>>>
>>>>> For emulated devices there is no cache. And in any case the guest is
>>>>> allowed to write to it... Who knows what AIX does? I do not.
>>>>
>>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok
>>>> but
>>>> how to migrate such thing? Temporary cache somewhere and then unpack
>>>> it? Or
>>>> use old style migration callbacks?
>>> Could you try to introduce a new vmstate type that just serializes and
>>> deserializes hash tables? Maybe there is already a serialization
>>> function for it in glib?
>> I have not found any (most likely I do not know how to search there),
>> I added mine, here are VMSTATE_HASH + its use for SPAPR.
>>
>>
>> Is this a movement to right direction? I need to put key/value sizes
>> into VMSTATE definition somehow but do not really want to touch
>> VMStateField.
> 
> I'm not sure. Juan?


I also tried to simplify this particular thing more by assuming that the
key is always "int" and put size of value to VMStateField::size. But there
is no way to get the size in VMStateInfo::get(). Or I could do a
"subsection" and try implementing has as an array (and have get/put defined
for items, should work) but in this case I'll need to cache number of
elements of the hash table somewhere so I'll need a wrapper around GHashTable.

Making generalized version is not obvious for me :(


-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]