qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
Date: Sat, 27 Apr 2019 10:09:51 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 27/04/19 07:29, Paolo Bonzini wrote:
> 
>>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
>>> resample feature, but vfio never gets the unmask notification, so the
>>> device remains with DisINTx set and no further interrupts are
>>> generated.  Do we expect KVM's IRQFD with resampler to work in the
>>> split IRQ mode?  We can certainly hope that "high performance" devices
>>> use MSI or MSI/X, but this would be quite a performance regression with
>>> split mode if our userspace bypass for INTx goes away.  Thanks,
>>
>> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
>> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
>> kvm_notify_acked_gsi(),
> 
> That wouldn't help because kvm_ioapic_update_eoi would not even be
> able to access vcpu->kvm->arch.vioapic (it's NULL).
> 
> The following untested patch would signal the resamplefd in 
> kvm_ioapic_send_eoi,
> before requesting the exit to userspace.  However I am not sure how QEMU
> sets up the VFIO eventfds: if I understand correctly, when VFIO writes again 
> to
> the irq eventfd, the interrupt request would not reach the userspace IOAPIC, 
> but
> only the in-kernel LAPIC.  That would be incorrect, and if my understanding is
> correct we need to trigger resampling from hw/intc/ioapic.c.

Actually it's worse: because you're bypassing IOAPIC when raising the
irq, the IOAPIC's remote_irr for example will not be set.  So split
irqchip currently must disable the intx fast path completely.

I guess we could also reimplement irqfd and resamplefd in the userspace
IOAPIC, and run the listener in a separate thread (using "-object
iothread" on the command line and AioContext in the code).

Paolo

> 
> Something like:
> 
> 1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly
> 
> 2) add a NotifierList in kvm_irqchip_assign_irqfd
> 
> 3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c 
> which
> stores the resamplefd as an EventNotifier*
> 
> 4) ioapic_eoi_broadcast writes to the resamplefd
> 
> BTW, how do non-x86 platforms handle intx resampling?
> 
> Paolo
> 
> ---- 8< -----
> From: Paolo Bonzini <address@hidden>
> Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes
> 
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index ea1a4e0297da..be337e06e3fd 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
>                          ulong *ioapic_handled_vectors);
>  void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>                           ulong *ioapic_handled_vectors);
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
> +
>  #endif
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 3cc3b2d130a0..46ea79a0dd8a 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>       srcu_read_unlock(&kvm->irq_srcu, idx);
>  }
>  
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
> +{
> +        struct kvm_kernel_irq_routing_entry *entry;
> +        struct kvm_irq_routing_table *table;
> +        u32 i, nr_ioapic_pins;
> +        int idx;
> +
> +        idx = srcu_read_lock(&kvm->irq_srcu);
> +        table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> +        nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
> +                               kvm->arch.nr_reserved_ioapic_pins);
> +
> +     for (i = 0; i < nr_ioapic_pins; i++) {
> +                hlist_for_each_entry(entry, &table->map[i], link) {
> +                        struct kvm_lapic_irq irq;
> +
> +                        if (entry->type != KVM_IRQ_ROUTING_MSI)
> +                                continue;
> +
> +                     kvm_set_msi_irq(kvm, entry, &irq);
> +                     if (irq.level && irq.vector == vector)
> +                             kvm_notify_acked_gsi(kvm, i);
> +             }
> +     }
> +
> +     srcu_read_unlock(&kvm->irq_srcu, idx);
> +}
> +
>  void kvm_arch_irq_routing_update(struct kvm *kvm)
>  {
>       kvm_hv_irq_routing_update(kvm);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9f089e2e09d0..8f8e76d77925 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic 
> *apic, int vector)
>  
>  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>  {
> +     struct kvm_vcpu *vcpu = apic->vcpu;
>       int trigger_mode;
>  
>       /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> @@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic 
> *apic, int vector)
>               return;
>  
>       /* Request a KVM exit to inform the userspace IOAPIC. */
> -     if (irqchip_split(apic->vcpu->kvm)) {
> -             apic->vcpu->arch.pending_ioapic_eoi = vector;
> -             kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
> +     if (irqchip_split(vcpu->kvm)) {
> +             kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
> +             vcpu->arch.pending_ioapic_eoi = vector;
> +             kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
>               return;
>       }
>  
> @@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, 
> int vector)
>       else
>               trigger_mode = IOAPIC_EDGE_TRIG;
>  
> -     kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
> +     kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
>  }
>  
>  static int apic_set_eoi(struct kvm_lapic *apic)
> 
> 
>> so it looks like KVM really ought to return an
>> error when trying to register a resample IRQFD when irqchip_split(),
>> but do we have better options?  EOI handling in QEMU is pretty much
>> non-existent and even if it was in place, it's a big performance
>> regression for vfio INTx handling.  Is a split irqchip that retains
>> performant resampling (EOI) support possible?  Thanks,




reply via email to

[Prev in Thread] Current Thread [Next in Thread]