qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer inte


From: Cédric Le Goater
Subject: Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
Date: Tue, 10 Oct 2017 17:56:08 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

On 10/10/2017 10:08 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-10-09 at 17:49 +0200, Cédric Le Goater wrote:
>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>> account anymore in the cpu_has_work() routine. Only the pending
>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>
>> If the DECR timer fires after 'stop-self' is called and before the CPU
>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>> and the guest will crash. This case happens very frequently with the
>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>> occasionally fired but after 'stop' state, so no work is to be done
>> and the guest survives.
>>
>> I suspect there is a race between the QEMU mainloop triggering the
>> timers and the TCG CPU thread but I could not quite identify the root
>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>> when the CPU is halted and reenable it when the CPU is restarted.
>>
>> Signed-off-by: Cédric Le Goater <address@hidden>
> 
> We should disable external interrupts and doorbells too no ? IE, we
> could clear all of PECE in fact.

and enable them all in 'start-cpu' for secondaries then ? 

C. 


> 
>> ---
>>
>> Changes in v2:
>>
>>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>>    if it is a secondary
>>
>>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>>  2 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>> index cdf0b607a0a0..dfdbf1e2c6f8 100644
>> --- a/hw/ppc/spapr_rtas.c
>> +++ b/hw/ppc/spapr_rtas.c
>> @@ -46,6 +46,7 @@
>>  #include "qemu/cutils.h"
>>  #include "trace.h"
>>  #include "hw/ppc/fdt.h"
>> +#include "target/ppc/cpu-models.h"
>>  
>>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState 
>> *spapr,
>>                                     uint32_t token, uint32_t nargs,
>> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, 
>> sPAPRMachineState *spapr,
>>          kvm_cpu_synchronize_state(cs);
>>  
>>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
>> +
>> +        /* Enable DECR interrupt */
>> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>> +            env->spr[SPR_LPCR] |= LPCR_DEE;
>> +        } else {
>> +            /* P7 and P8 both have same bit for DECR */
>> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
>> +        }
>> +
>>          env->nip = start;
>>          env->gpr[3] = r3;
>>          cs->halted = 0;
>> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
>> sPAPRMachineState *spapr,
>>       * no need to bother with specific bits, we just clear it.
>>       */
>>      env->msr = 0;
>> +
>> +    /* Don't let the decremeter run on a CPU being stopped. This could
>> +     * deliver an interrupt on a dying CPU and crash the guest.
>> +     */
>> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
>> +    } else {
>> +        /* P7 and P8 both have same bit for DECR */
>> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
>> +    }
>>  }
>>  
>>  static inline int sysparm_st(target_ulong addr, target_ulong len,
>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>> index 0d6379fcc5b4..1a62159843e7 100644
>> --- a/target/ppc/translate_init.c
>> +++ b/target/ppc/translate_init.c
>> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, 
>> PPCVirtualHypervisor *vhyp)
>>      CPUPPCState *env = &cpu->env;
>>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
>> +    CPUState *cs = CPU(cpu);
>>  
>>      cpu->vhyp = vhyp;
>>  
>> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, 
>> PPCVirtualHypervisor *vhyp)
>>          } else {
>>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>>          }
>> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
>> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>>                                 LPCR_OEE;
>> +
>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>> +         * command start-cpu will enable it on secondaries.
>> +         */
>> +        if (cs == first_cpu) {
>> +            lpcr->default_value |= LPCR_DEE;
>> +        }
>>          break;
>>      default:
>>          /* P7 and P8 has slightly different PECE bits, mostly because P8 
>> adds
>> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, 
>> PPCVirtualHypervisor *vhyp)
>>           * will work as expected for both implementations
>>           */
>>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | 
>> LPCR_P8_PECE2 |
>> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
>> +                               LPCR_P8_PECE4;
>> +
>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>> +         * command start-cpu will enable it on secondaries.
>> +         */
>> +        if (cs == first_cpu) {
>> +            lpcr->default_value |= LPCR_P8_PECE3;
>> +        }
>>      }
>>  
>>      /* We should be followed by a CPU reset but update the active value




reply via email to

[Prev in Thread] Current Thread [Next in Thread]