qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of use


From: Jan Kiszka
Subject: Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors
Date: Wed, 19 Oct 2011 00:13:49 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2011-10-18 23:40, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>>>
>>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>>>> only a current interpretation of one specific arch. )
>>>>>
>>>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>>>
>>>> Right, but those additional bits like the destination make different
>>>> messages. We have to encode those 24 bits into a unique GSI number and
>>>> restore them (by table lookup) on APIC injection inside the kernel. If
>>>> we only had to encode 256 different vectors, we would be done already.
>>>
>>> Right. But in practice guests always use distinct vectors (from the
>>> 256 available) for distinct messages. This is because
>>> the vector seems to be the only thing that gets communicated by the APIC
>>> to the software.
>>>
>>> So e.g. a table with 256 entries, with extra 1024-256
>>> used for spill-over for guests that do something unexpected,
>>> would work really well.
>>
>> Already Linux manages vectors on a pre-CPU basis. For efficiency
>> reasons, it does not exploit the full range of 256 vectors but actually
>> allocates them in - IIRC - steps of 16. So I would not be surprised to
>> find lots of vector number "collisions" when looking over a full set of
>> CPUs in a system.
>>
>> Really, these considerations do not help us. We must store all 96 bits,
>> already for the sake of other KVM architectures that want MSI routing.
>>>
>>>
>>>>>
>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>>>> messages, we could run out of them when creating routes, statically or
>>>>>>> lazily.
>>>>>>>
>>>>>>> What would probably help us long-term out of your concerns regarding
>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>>>> address and data directly.
>>>>>>
>>>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>>>
>>>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>>>> host kernels would force us to do searches for already cached messages.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> Hmm, I'm not all that sure. Existing design really allows
>>>>> caching the route in various smart ways. We currently do
>>>>> this for irqfd but this can be extended to ioctls.
>>>>> If we just let the guest inject arbitrary messages,
>>>>> that becomes much more complex.
>>>>
>>>> irqfd and kvm device assignment do not allow us to inject arbitrary
>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>>>> routes from an MSI message to a GSI number (+they configure the related
>>>> backends).
>>>
>>> Yes, it's a very flexible API but it would be very hard to optimize.
>>> GSIs let us do the slow path setup, but they make it easy
>>> to optimize target lookup in kernel.
>>
>> Users of the API above have no need to know anything about GSIs. They
>> are an artifact of the KVM-internal interface between user space and
>> kernel now - thanks to the MSIRoutingCache encapsulation.
> 
> Yes but I am saying that the API above can't be implemented
> more efficiently than now: you will have to scan all apics on each MSI.
> The GSI implementation can be optimized: decode the vector once,
> if it matches a single vcpu, store that vcpu and use when sending
> interrupts.

Sorry, missed that you switched to kernel.

What information do you want to cache there that cannot be easily
obtained by looking at a concrete message? I do not see any. Once you
checked that the delivery mode targets a specific cpu, you could address
it directly. Or are you thinking about some cluster mode?

> 
> 
>>>
>>> An analogy would be if read/write operated on file paths.
>>> fd makes it easy to do permission checks and slow lookups
>>> in one place. GSI happens to work like this (maybe, by accident).
>>
>> Think of an opaque file handle as a MSIRoutingCache object. And it
>> encodes not only the routing handle but also other useful associated
>> information we need from time to time - internally, not in the device
>> models.
> 
> Forget qemu abstractions, I am talking about data path
> optimizations in kernel in kvm. From that POV the point of an fd is not
> that it is opaque. It is that it's an index in an array that
> can be used for fast lookups.
> 
>>>>>
>>>>> Another concern is mask bit emulation. We currently
>>>>> handle mask bit in userspace but patches
>>>>> to do them in kernel for assigned devices where seen
>>>>> and IMO we might want to do that for virtio as well.
>>>>>
>>>>> For that to work the mask bit needs to be tied to
>>>>> a specific gsi or specific device, which does not
>>>>> work if we just inject arbitrary writes.
>>>>
>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>
>>>> Jan
>>>>
>>>
>>> I do.
>>> How would we maintain a mask/pending bit in kernel if we are not
>>> supplied info on all available vectors even?
>>
>> It's tricky to discuss an undefined interface (there only exists an
>> outdated proposal for kvm device assignment). But I suppose that user
>> space will have to define the maximum number of vectors when creating an
>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>
>> The number of used vectors will correlate with the number of registered
>> irqfds (in the case of vhost or vfio, device assignment still has
>> SET_MSIX_NR). As kernel space would then be responsible for mask
>> processing, user space would keep vectors registered with irqfds, even
>> if they are masked. It could just continue to play the trick and drop
>> data=0 vectors.
> 
> Which trick?  We don't play any tricks except for device assignment.
> 
>> The point here is: All those steps have _nothing_ to do with the generic
>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>> an API. In contrast, msix_vector_use/unuse were generic services that
>> were actually created to please KVM requirements. But if we split that
>> up, we can address the generic MSI-X requirements in a way that makes
>> more sense for emulated devices (and particularly msix_vector_use makes
>> no sense for emulation).
>>
>> Jan
>>
> 
> We need at least msix_vector_unuse

Not at all. We rather need some qemu_irq_set(level) for MSI. The spec
requires that the device clears pending when the reason for that is
removed. And any removal that is device model-originated should simply
be signaled like an irq de-assert. Vector "unusage" is just one reason here.

> - IMO it makes more sense than "clear
> pending vector". msix_vector_use is good to keep around for symmetry:
> who knows whether we'll need to allocate resources per vector
> in the future.

For MSI[-X], the spec is already there, and we know that there no need
for further resources when emulating it. Only KVM has special needs.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]