[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default |
Date: |
Sat, 27 Apr 2019 01:29:10 -0400 (EDT) |
> > In my testing it looks like KVM advertises supporting the KVM_IRQFD
> > resample feature, but vfio never gets the unmask notification, so the
> > device remains with DisINTx set and no further interrupts are
> > generated. Do we expect KVM's IRQFD with resampler to work in the
> > split IRQ mode? We can certainly hope that "high performance" devices
> > use MSI or MSI/X, but this would be quite a performance regression with
> > split mode if our userspace bypass for INTx goes away. Thanks,
>
> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
> kvm_notify_acked_gsi(),
That wouldn't help because kvm_ioapic_update_eoi would not even be
able to access vcpu->kvm->arch.vioapic (it's NULL).
The following untested patch would signal the resamplefd in kvm_ioapic_send_eoi,
before requesting the exit to userspace. However I am not sure how QEMU
sets up the VFIO eventfds: if I understand correctly, when VFIO writes again to
the irq eventfd, the interrupt request would not reach the userspace IOAPIC, but
only the in-kernel LAPIC. That would be incorrect, and if my understanding is
correct we need to trigger resampling from hw/intc/ioapic.c.
Something like:
1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly
2) add a NotifierList in kvm_irqchip_assign_irqfd
3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c
which
stores the resamplefd as an EventNotifier*
4) ioapic_eoi_broadcast writes to the resamplefd
BTW, how do non-x86 platforms handle intx resampling?
Paolo
---- 8< -----
From: Paolo Bonzini <address@hidden>
Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index ea1a4e0297da..be337e06e3fd 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
ulong *ioapic_handled_vectors);
void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
ulong *ioapic_handled_vectors);
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
+
#endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 3cc3b2d130a0..46ea79a0dd8a 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
srcu_read_unlock(&kvm->irq_srcu, idx);
}
+void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
+{
+ struct kvm_kernel_irq_routing_entry *entry;
+ struct kvm_irq_routing_table *table;
+ u32 i, nr_ioapic_pins;
+ int idx;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+ nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
+ kvm->arch.nr_reserved_ioapic_pins);
+
+ for (i = 0; i < nr_ioapic_pins; i++) {
+ hlist_for_each_entry(entry, &table->map[i], link) {
+ struct kvm_lapic_irq irq;
+
+ if (entry->type != KVM_IRQ_ROUTING_MSI)
+ continue;
+
+ kvm_set_msi_irq(kvm, entry, &irq);
+ if (irq.level && irq.vector == vector)
+ kvm_notify_acked_gsi(kvm, i);
+ }
+ }
+
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+}
+
void kvm_arch_irq_routing_update(struct kvm *kvm)
{
kvm_hv_irq_routing_update(kvm);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9f089e2e09d0..8f8e76d77925 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic
*apic, int vector)
static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
{
+ struct kvm_vcpu *vcpu = apic->vcpu;
int trigger_mode;
/* Eoi the ioapic only if the ioapic doesn't own the vector. */
@@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic,
int vector)
return;
/* Request a KVM exit to inform the userspace IOAPIC. */
- if (irqchip_split(apic->vcpu->kvm)) {
- apic->vcpu->arch.pending_ioapic_eoi = vector;
- kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
+ if (irqchip_split(vcpu->kvm)) {
+ kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
+ vcpu->arch.pending_ioapic_eoi = vector;
+ kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
return;
}
@@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic,
int vector)
else
trigger_mode = IOAPIC_EDGE_TRIG;
- kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
+ kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
}
static int apic_set_eoi(struct kvm_lapic *apic)
> so it looks like KVM really ought to return an
> error when trying to register a resample IRQFD when irqchip_split(),
> but do we have better options? EOI handling in QEMU is pretty much
> non-existent and even if it was in place, it's a big performance
> regression for vfio INTx handling. Is a split irqchip that retains
> performant resampling (EOI) support possible? Thanks,