qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer inte


From: Cédric Le Goater
Subject: Re: [Qemu-devel] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
Date: Thu, 12 Oct 2017 11:25:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

On 10/12/2017 12:46 AM, David Gibson wrote:
> On Wed, Oct 11, 2017 at 01:55:20PM +0200, Cédric Le Goater wrote:
>> On 10/11/2017 08:45 AM, David Gibson wrote:
>>> On Mon, Oct 09, 2017 at 05:49:28PM +0200, Cédric Le Goater wrote:
>>>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>>>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>>>> account anymore in the cpu_has_work() routine. Only the pending
>>>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>>>
>>>> If the DECR timer fires after 'stop-self' is called and before the CPU
>>>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>>>> and the guest will crash. This case happens very frequently with the
>>>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>>>> occasionally fired but after 'stop' state, so no work is to be done
>>>> and the guest survives.
>>>>
>>>> I suspect there is a race between the QEMU mainloop triggering the
>>>> timers and the TCG CPU thread but I could not quite identify the root
>>>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>>>> when the CPU is halted and reenable it when the CPU is restarted.
>>>>
>>>> Signed-off-by: Cédric Le Goater <address@hidden>
>>>> ---
>>>>
>>>> Changes in v2:
>>>>
>>>>  - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
>>>>  - removed the LPCR:PECE* enablement bit when the CPU is initialized
>>>>    if it is a secondary
>>>>
>>>>  hw/ppc/spapr_rtas.c         | 20 ++++++++++++++++++++
>>>>  target/ppc/translate_init.c | 19 +++++++++++++++++--
>>>>  2 files changed, 37 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>> index cdf0b607a0a0..dfdbf1e2c6f8 100644
>>>> --- a/hw/ppc/spapr_rtas.c
>>>> +++ b/hw/ppc/spapr_rtas.c
>>>> @@ -46,6 +46,7 @@
>>>>  #include "qemu/cutils.h"
>>>>  #include "trace.h"
>>>>  #include "hw/ppc/fdt.h"
>>>> +#include "target/ppc/cpu-models.h"
>>>>  
>>>>  static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState 
>>>> *spapr,
>>>>                                     uint32_t token, uint32_t nargs,
>>>> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, 
>>>> sPAPRMachineState *spapr,
>>>>          kvm_cpu_synchronize_state(cs);
>>>>  
>>>>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
>>>> +
>>>> +        /* Enable DECR interrupt */
>>>> +        if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>>>
>>> Sorry, I didn't reply to your earlier mail in time.  Going via the PVR
>>> in this way seems bonkers to me - I like it even less than checking
>>> the mmu type.  After all, classifying a bunch of precise models (PVRs)
>>> together by behaviour is kind of exactly what the CPU classes are for,
>>> so using object_dynamic_case() (==instance_of) is a better idea here.
>>
>> hmm, and which type should I use ? we don't have any TYPE_POWER9* we 
>> could use for a object_dynamic_cast(). I don't think so ? I could use 
>> the name and strcmp("power9") probably but it looks ugly.
> 
> Actually there is, but, yeah, it's a lot less obvious than I thought.
> It's constructed by the POWERPC_FAILY macro and will be
> "POWER9-family-powerpc64-cpu"
> 
>> The only thing we have is "CPU_POWERPC_POWER9_BASE" and it only 
>> applicates to PVR.
>>
>> May be I don't understand your idea.
> 
> Urgh, sorry.  This got much muckier than I thought it would be.  I
> think maybe it's best to go back to the mmu type test, and later on we
> can fix up both the previously existing test like that, and the new
> one to something better.

Given that the bits are the same on all processors, why not just use :

    env->spr[SPR_LPCR] |= LPCR_PECE_L_MASK;

and

    env->spr[SPR_LPCR] &= ~LPCR_PECE_L_MASK;

Thanks,

C. 


>>>> +            env->spr[SPR_LPCR] |= LPCR_DEE;
>>>> +        } else {
>>>> +            /* P7 and P8 both have same bit for DECR */
>>>> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
>>>> +        }
>>>> +
>>>>          env->nip = start;
>>>>          env->gpr[3] = r3;
>>>>          cs->halted = 0;
>>>
>>> The other option I'm wondering about here is to actually add a
>>> "shutdown" (or something) method to the cpu class, which does whatever
>>> is necessary to put the vcpu into a quiescent state that won't be
>>> woken up unless it's specifically requested.
>>
>> yes. That is a good idea. 
>>
>> Thanks,
>>
>> C. 
>>
>>
>>>> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
>>>> sPAPRMachineState *spapr,
>>>>       * no need to bother with specific bits, we just clear it.
>>>>       */
>>>>      env->msr = 0;
>>>> +
>>>> +    /* Don't let the decremeter run on a CPU being stopped. This could
>>>> +     * deliver an interrupt on a dying CPU and crash the guest.
>>>> +     */
>>>> +    if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
>>>> +        env->spr[SPR_LPCR] &= ~LPCR_DEE;
>>>> +    } else {
>>>> +        /* P7 and P8 both have same bit for DECR */
>>>> +        env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
>>>> +    }
>>>>  }
>>>>  
>>>>  static inline int sysparm_st(target_ulong addr, target_ulong len,
>>>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>>>> index 0d6379fcc5b4..1a62159843e7 100644
>>>> --- a/target/ppc/translate_init.c
>>>> +++ b/target/ppc/translate_init.c
>>>> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, 
>>>> PPCVirtualHypervisor *vhyp)
>>>>      CPUPPCState *env = &cpu->env;
>>>>      ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
>>>>      ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
>>>> +    CPUState *cs = CPU(cpu);
>>>>  
>>>>      cpu->vhyp = vhyp;
>>>>  
>>>> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, 
>>>> PPCVirtualHypervisor *vhyp)
>>>>          } else {
>>>>              lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
>>>>          }
>>>> -        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | 
>>>> LPCR_DEE |
>>>> +        lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
>>>>                                 LPCR_OEE;
>>>
>>> But I guess we'd also need a "set_papr" method to go with that.
>>>
>>>> +
>>>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>>>> +         * command start-cpu will enable it on secondaries.
>>>> +         */
>>>> +        if (cs == first_cpu) {
>>>> +            lpcr->default_value |= LPCR_DEE;
>>>> +        }
>>>>          break;
>>>>      default:
>>>>          /* P7 and P8 has slightly different PECE bits, mostly because P8 
>>>> adds
>>>> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu, 
>>>> PPCVirtualHypervisor *vhyp)
>>>>           * will work as expected for both implementations
>>>>           */
>>>>          lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | 
>>>> LPCR_P8_PECE2 |
>>>> -                               LPCR_P8_PECE3 | LPCR_P8_PECE4;
>>>> +                               LPCR_P8_PECE4;
>>>> +
>>>> +        /* Only let the decremeter wake up the boot CPU. The RTAS
>>>> +         * command start-cpu will enable it on secondaries.
>>>> +         */
>>>> +        if (cs == first_cpu) {
>>>> +            lpcr->default_value |= LPCR_P8_PECE3;
>>>> +        }
>>>>      }
>>>>  
>>>>      /* We should be followed by a CPU reset but update the active value
>>>
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]