[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH RFC] spapr: ignore interrupts during reset state
From: |
Nikunj A Dadhania |
Subject: |
Re: [Qemu-devel] [PATCH RFC] spapr: ignore interrupts during reset state |
Date: |
Fri, 09 Jun 2017 10:32:25 +0530 |
David Gibson <address@hidden> writes:
> On Thu, Jun 08, 2017 at 12:06:08PM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>
> Ouch. When exactly did this happen?
Broken since long
> I know that smp boot used to work under TCG, albeit very slowly.
SMP boot works, its the reboot issued from the guest doesn't boot and
crashes in SLOF.
>> When reset happens, all the CPUs are in halted state. First CPU is brought
>> out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>>
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>
> Ok.. how is that happening given that the secondary CPUs should have
> MSR[EE] == 0?
Basically, the CPU is in halted condition and has_work() does not check
for MSR_EE in that case. But I am not sure if checking MSR_EE is
sufficient, as the CPU does go to halted state (idle) while running as
well.
static bool cpu_has_work_POWER8(CPUState *cs)
{
PowerPCCPU *cpu = POWERPC_CPU(cs);
CPUPPCState *env = &cpu->env;
if (cs->halted) {
[ SNIP ]
/* Does not check for msr_ee */
} else {
return msr_ee && (cs->interrupt_request & CPU_INTERRUPT_HARD);
}
}
>
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>>
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> [snip]
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index d10808d..eb88bcb 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1013,6 +1013,13 @@ struct CPUPPCState {
>> int access_type; /* when a memory exception occurs, the access
>> type is stored here */
>>
>> + /* CPU in reset, shouldn't process any interrupts.
>> + *
>> + * Decrementer interrupts in TCG can still wake the CPU up. Make sure
>> that
>> + * when this variable is set, cpu_has_work_* should return false.
>> + */
>> + int in_reset;
>
> So I'd really rather not add another flag to the cpu structure,
> especially since we'd then need to migrate it as well.
I agree, Bharata and I did discuss about the migrate case. This patch
was to highlight the exact issue.
> I'm pretty sure there should be a way to inhibit the unwanted
> interrupts using existing mechanisms.
One of the thing that I had observed was msr had just MSR_SF bit set
during the reset case, we can test for that maybe.
The below works as well:
+ if ((env->msr & ~(1ull << MSR_SF)) == 0) {
+ return false;
+ }
>> +
>> CPU_COMMON
>>
>> /* MMU context - only relevant for full system emulation */
>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>> index 56a0ab2..64f4348 100644
>> --- a/target/ppc/translate_init.c
>> +++ b/target/ppc/translate_init.c
>> @@ -8561,6 +8561,9 @@ static bool cpu_has_work_POWER7(CPUState *cs)
>> CPUPPCState *env = &cpu->env;
>>
>> if (cs->halted) {
>> + if (env->in_reset) {
>> + return false;
>> + }
>> if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>> return false;
>> }
>> @@ -8718,6 +8721,9 @@ static bool cpu_has_work_POWER8(CPUState *cs)
>> CPUPPCState *env = &cpu->env;
>>
>> if (cs->halted) {
>> + if (env->in_reset) {
>> + return false;
>> + }
>> if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>> return false;
>> }
>> @@ -8899,6 +8905,9 @@ static bool cpu_has_work_POWER9(CPUState *cs)
>> CPUPPCState *env = &cpu->env;
>>
>> if (cs->halted) {
>> + if (env->in_reset) {
>> + return false;
>> + }
>> if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>> return false;
>> }
Regards
Nikunj