qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 00/29] PowerPC interrupt rework


From: Cédric Le Goater
Subject: Re: [RFC PATCH v2 00/29] PowerPC interrupt rework
Date: Mon, 3 Oct 2022 22:58:35 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1

(qemu) info pic
info pic
CPU[0000]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0000]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0000]:   OS    00   00  00    ff   ff  00  ff   ff  00000000
CPU[0000]: POOL    00   00  00    ff   00  00  00   00  00000000
CPU[0000]: PHYS    00   ff  00    00   00  00  00   ff  80000000
CPU[0001]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0001]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0001]:   OS    00   00  00    ff   ff  00  ff   ff  00000000
CPU[0001]: POOL    00   00  00    ff   00  00  00   00  00000001
CPU[0001]: PHYS    00   ff  00    00   00  00  00   ff  80000000
CPU[0002]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0002]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0002]:   OS    00   00  00    ff   ff  00  ff   ff  00000000
CPU[0002]: POOL    00   00  00    ff   00  00  00   00  00000002
CPU[0002]: PHYS    00   ff  00    00   00  00  00   ff  80000000
CPU[0003]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0003]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0003]:   OS    00   ff  00    00   ff  00  ff   ff  00000004

vCPU 4 was scheduled to run on this CPU at some point, but it is not
anymore : no VALID bit.

CPU[0003]: POOL    00   00  00    ff   00  00  00   00  00000003
CPU[0003]: PHYS    00   ff  00    00   00  00  00   ff  80000000
XIVE[0] #0 Source 00000000 .. 000fffff
   00000014 MSI --
   00000015 MSI --
   00000016 MSI --
   00000017 MSI --
   00000018 MSI --
   00000019 MSI --
   0000001a MSI --
   0000001b MSI --
   0000001e MSI P-

The 0x1E HW interrupt (virtual device) is pending. And not queued.

   00000023 MSI --
   00000024 MSI --
   00000025 MSI --
   00000026 MSI --
XIVE[0] #0 EAT 00000000 .. 000fffff
   00000014   end:00/000f data:00000010
   00000015   end:00/0017 data:00000010
   00000016   end:00/001f data:00000010
  00000017   end:00/0027 data:00000010 -> 0x10 == CPU IPI   00000018   end:00/004e data:00000010 -> This is the vCPU IPI   00000019   end:00/004e data:00000012
   0000001a   end:00/004e data:0000001b
   0000001b   end:00/004e data:00000013
   0000001e   end:00/004e data:00000016

notificationd of 0x1E HW interrupts will be pushed on vCPU 0 queue 0x4e,
with (Linux) effective interrupt number 0x16, the console may be.

   00000023   end:00/004e data:00000017
   00000024   end:00/004e data:00000018
   00000025   end:00/004e data:00000019
   00000026   end:00/004e data:0000001a

Follow the PHB interrupts, MSI and LSIs.

   000fb000   end:00/001f data:00000030
   000fb001   end:00/0027 data:00000031
   000fb002   end:00/000f data:00000032
   000fb003   end:00/000f data:00000033
   000fb004   end:00/0017 data:00000034
   000fb005   end:00/001f data:00000035
   000fb006   end:00/0027 data:00000036
   000fb7fe   end:00/000f data:00000029
   000fb7ff   end:00/0017 data:0000002a
   000fbffe   end:00/001f data:00000027
   000fbfff   end:00/0027 data:00000028
   000fcffe   end:00/000f data:00000025
   000fcfff   end:00/0017 data:00000026
   000fd000   end:00/001f data:00000037
   000fd001   end:00/000f data:00000038
   000fd002   end:00/0017 data:00000039
   000fd003   end:00/001f data:0000003a
   000fd004   end:00/0027 data:0000003b
   000fd7fe   end:00/001f data:00000023
   000fd7ff   end:00/0027 data:00000024
   000fdffe   end:00/000f data:00000021
   000fdfff   end:00/0017 data:00000022
   000feffe   end:00/001f data:0000001f
   000fefff   end:00/0027 data:00000020

opal events are after

  000ffff0   end:00/000f data:00000011   000ffff1   end:00/0017 data:00000012   000ffff2   end:00/001f data:00000013   000ffff3   end:00/0027 data:00000014 # opal-psi#0:lpchc
   000ffff4   end:00/000f data:00000015
   000ffff5   end:00/0017 data:00000016
   000ffff6   end:00/001f data:00000017
   000ffff7   end:00/0027 data:00000018
   000ffff8   end:00/000f data:00000019
   000ffff9   end:00/0017 data:0000001a
   000ffffa   end:00/001f data:0000001b
   000ffffb   end:00/0027 data:0000001c
   000ffffc   end:00/000f data:0000001d
  000ffffd   end:00/0017 data:0000001e # opal-psi#0:psu ? XIVE[0] #0 ENDT
   0000000f -Q vqnb---f prio:7 nvt:00/0080 eq:@03400000   825/16384 ^1 [ 
8000004f 8000004f 8000004f 8000004f 8000004f ^00000000 ]

event queue of host CPU 0 is filling up with escalation interrupt
numbers, 0x4f.

host CPU 0 (queue 0xf) is serving its own IPI, some MSIs, some EEH PCI
interrupts, and some OPAL events.

   00000017 -Q vqnb---f prio:7 nvt:00/0084 eq:@03750000  1048/16384 ^1 [ 
8000001e 8000001e 8000001e 8000001e 8000001e ^00000000 ]

hmm, host CPU 1 is serving 0xffffd = opal-psi#0:psu. May be too much.

   0000001f -Q vqnb---f prio:7 nvt:00/0088 eq:@037f0000   154/16384 ^1 [ 
8000003a 8000003a 8000003a 8000003a 8000003a ^00000000 ]

0x3a is an MSI.

   00000027 -Q vqnb---f prio:7 nvt:00/008c eq:@038a0000   340/16384 ^1 [ 
80000014 80000014 80000014 80000014 8000003b ^00000000 ]

This is the console 0x14 and 0x3b is an MSI

   0000004e -Q vqnbeu-- prio:6 nvt:00/0004 eq:@1d170000  1104/16384 ^1 [ 
80000016 80000016 80000016 80000016 80000016 ^00000000 ]

0x4e (0x48 + 6) is the event queue number of guest's vCPU 0 prio 6.
0x16 is the Linux interrupt number in the guest of HW interrupt 0x1e,
the one pending.

   0000004f -Q v--be-s- prio:0 nvt:00/0000

0x4f is the escalation queue of vCPU 0 (when vCPU is not dispatched on any
HW threads) 0x4f is also a source interrupt number for escalations.

XIVE[0] #0 END Escalation EAT
   0000004e -Q    end:00/004f data:00000000
   0000004f P-    end:00/000f data:0000004f

0x4f interrupt number is pending. vPCU 0 should be dispatched but the
escalation interrupt has not being served by the hypervisor at this
point in time. Since it is not queued, we may have reached some deadlock ?

XIVE[0] #0 NVTT 00000000 .. 0007ffff
   00000000 end:00/0028 IPB:00
   00000001 end:00/0030 IPB:00
   00000002 end:00/0038 IPB:00
   00000003 end:00/0040 IPB:00
   00000004 end:00/0048 IPB:02

        ^

0x4 is the vCPU0 notification virtual target number and an interrupt is
pending on prio 6. vCPU 0 did not acknowledge it yet, because vCPU 0 (NVT=4)
has not been dispatched on any HW thread because the escalation interrupt
was not handled on the host (CPU 0 should). Question is what is CPU 0 up to?


   00000080 end:00/0008 IPB:00
   00000084 end:00/0010 IPB:00
   00000088 end:00/0018 IPB:00
   0000008c end:00/0020 IPB:00
PSIHB Source 000ffff0 .. 000ffffd
   000ffff0 LSI --
   000ffff1 LSI --
   000ffff2 LSI --
   000ffff3 LSI --
   000ffff4 LSI --
   000ffff5 LSI --
   000ffff6 LSI --
   000ffff7 LSI --
   000ffff8 LSI --
   000ffff9 LSI --
   000ffffa LSI --
   000ffffb LSI --
   000ffffc LSI --
   000ffffd LSI --
PHB4[0:0] Source 000fe000 .. 000fefff  @6030203110100
   00000ffe LSI --
   00000fff LSI --
PHB4[0:5] Source 000fb000 .. 000fb7ff  @6030203110228
   00000000 MSI --
   00000001 MSI --
   00000002 MSI --
   00000003 MSI --
   00000004 MSI --
   00000005 MSI --
   00000006 MSI --
   000007fe LSI --
   000007ff LSI --
PHB4[0:4] Source 000fb800 .. 000fbfff  @6030203110220
   000007fe LSI --
   000007ff LSI --
PHB4[0:3] Source 000fc000 .. 000fcfff  @6030203110218
   00000ffe LSI --
   00000fff LSI --
PHB4[0:2] Source 000fd000 .. 000fd7ff  @6030203110210
   00000000 MSI --
   00000001 MSI --
   00000002 MSI --
   00000003 MSI --
   00000004 MSI --
   000007fe LSI --
   000007ff LSI --
PHB4[0:1] Source 000fd800 .. 000fdfff  @6030203110208
   000007fe LSI --
   000007ff LSI --

Could you please check with powersave=off in the host kernel also ?


It still hangs with this option.

This is going to need some serious digging to solve. It might not be worse
the time :/

C.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]