qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1 V6] qemu-kvm: fix improper nmi emulation


From: Jan Kiszka
Subject: Re: [Qemu-devel] [PATCH 1/1 V6] qemu-kvm: fix improper nmi emulation
Date: Tue, 18 Oct 2011 21:41:13 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

On 2011-10-17 18:00, Lai Jiangshan wrote:
> On 10/17/2011 05:49 PM, Avi Kivity wrote:
>> On 10/17/2011 11:40 AM, Lai Jiangshan wrote:
>>>>>
>>>>
>>>> LINT1 may have been programmed as a level -triggered interrupt instead
>>>> of edge triggered (NMI or interrupt).  We can use the ioctl argument for
>>>> the level (and pressing the NMI button needs to pulse the level to 1 and
>>>> back to 0).
>>>>
>>>
>>> Hi, Avi, Jan,
>>>
>>> Which approach you prefer to?
>>> I need to know the result before wasting too much time to respin
>>> the approach.
>>
>> Yes, sorry about the slow and sometimes conflicting feedback.
>>
>>> 1) Fix KVM_NMI emulation approach  (which is v3 patchset)
>>>     - It directly fixes the problem and matches the
>>>       real hard ware more, but it changes KVM_NMI bahavior.
>>>     - Require both kernel-site and userspace-site fix.
>>>
>>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed
>>>    (which is v4 patchset)
>>>     - Simple, don't changes any kernel behavior.
>>>     - Only need the userspace-site fix
>>>
>>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset)
>>>     - don't changes the kernel's KVM_NMI behavior.
>>>     - much complex
>>>     - Require both kernel-site and userspace-site fix.
>>>     - userspace-site should also handle the !KVM_SET_LINT1
>>>       condition, it uses all the 2) approach' code. it means
>>>       this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
>>>
>>> This is an urgent bug of us, we need to settle it down soo
>>
>> While (1) is simple, it overloads a single ioctl with two meanings,
>> that's not so good.
>>
>> Whether we do (1) or (3), we need (2) as well, for older kernels.
>>
>> So I recommend first focusing on (2) and merging it, then doing (3).
>>
>> (note an additional issue with 3 is whether to make it a vm or vcpu
>> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
>> choice).
>>
> 
> It is the 2) approach.
> It only changes the user space site, the kernel site is not touched.
> It is changed from previous v4 patch, fixed problems found by Jan.
> ----------------------------
> 
> From: Lai Jiangshan <address@hidden>
> 
> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
> button event happens. This doesn't properly emulate real hardware on
> which NMI button event triggers LINT1. Because of this, NMI is sent to
> the processor even when LINT1 is maskied in LVT. For example, this
> causes the problem that kdump initiated by NMI sometimes doesn't work
> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
> 
> With this patch, inject-nmi request is handled as follows.
> 
> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>   interrupt.
> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>   delivering the NMI directly. (Suggested by Jan Kiszka)
> 
> Changed from old version:
>   re-implement it by the Jan's suggestion.
>   fix the race found by Jan.
> 
> Signed-off-by: Lai Jiangshan <address@hidden>
> Reported-by: Kenji Kaneshige <address@hidden>
> ---
>  hw/apic.c |   33 +++++++++++++++++++++++++++++++++
>  hw/apic.h |    1 +
>  monitor.c |    6 +++++-
>  3 files changed, 39 insertions(+), 1 deletions(-)
> diff --git a/hw/apic.c b/hw/apic.c
> index 69d6ac5..922796a 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>      }
>  }
>  
> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
> +
> +static void kvm_irqchip_deliver_nmi(void *p)
> +{
> +    APICState *s = p;
> +    struct kvm_lapic_state klapic;
> +    uint32_t lvt;
> +
> +    kvm_get_lapic(s->cpu_env, &klapic);
> +    lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1);
> +
> +    if (lvt & APIC_LVT_MASKED) {
> +        return;
> +    }
> +
> +    if (((lvt >> 8) & 7) != APIC_DM_NMI) {
> +        return;
> +    }
> +
> +    kvm_vcpu_ioctl(s->cpu_env, KVM_NMI);
> +}
> +
> +void apic_deliver_nmi(DeviceState *d)
> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s);
> +    } else {
> +        apic_local_deliver(s, APIC_LVT_LINT1);
> +    }
> +}
> +
>  #define foreach_apic(apic, deliver_bitmask, code) \
>  {\
>      int __i, __j, __mask;\
> diff --git a/hw/apic.h b/hw/apic.h
> index c857d52..3a4be0a 100644
> --- a/hw/apic.h
> +++ b/hw/apic.h
> @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
>                               uint8_t trigger_mode);
>  int apic_accept_pic_intr(DeviceState *s);
>  void apic_deliver_pic_intr(DeviceState *s, int level);
> +void apic_deliver_nmi(DeviceState *d);
>  int apic_get_interrupt(DeviceState *s);
>  void apic_reset_irq_delivered(void);
>  int apic_get_irq_delivered(void);
> diff --git a/monitor.c b/monitor.c
> index cb485bf..0b81f17 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict 
> *qdict, QObject **ret_data)
>      CPUState *env;
>  
>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
> -        cpu_interrupt(env, CPU_INTERRUPT_NMI);
> +        if (!env->apic_state) {
> +            cpu_interrupt(env, CPU_INTERRUPT_NMI);
> +        } else {
> +            apic_deliver_nmi(env->apic_state);
> +        }
>      }
>  
>      return 0;

Looks OK to me.

Please don't forget to bake a qemu-only patch for those bits that apply
to upstream as well (ie. the user space APIC path).

Jan

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]