|
From: | Manish Mishra |
Subject: | Windows 10 and 11 VMs fails to boot with SapphireRapids CPU definition |
Date: | Thu, 18 Jul 2024 11:44:28 +0000 |
Hi Everyone, We are facing issues booting windows VMs with SapphireRapids CPU definition. This is happening in case we have multiple cores per vcpu set and the VM is a UEFI, secure
boot and credential guard enabled. Till now we have observed this issue on windows 10 and 11. We did some triaging around this. SapphireRapids CPU definition has raised cpuid_level to 0x20. This includes leaf V2 extended topology (0x1f). QEMU returns all zeros
in case !x86_has_extended_topo().
As per expectation(also mentioned in https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf) if guests see this it should fallback to 0x1b. Somehow windows 10 and windows 11 does not work well with this assumption and
panics on boot. We checked on one of the SapphireRapids node with no multi-die topology; this is how CPUID output looks like. 0x1f output is the same as 0xb. # cpuid -l 0xb -s 0 -1 CPU: x2APIC features / processor topology (0xb): extended APIC ID = 37 --- level 0 --- level number = 0x0 (0) level type = thread (1) bit width of level & previous levels = 0x1 (1) number of logical processors at level = 0x2 (2) # cpuid -l 0xb -s 1 -1 CPU: --- level 1 --- level number = 0x1 (1) level type = core (2) bit width of level & previous levels = 0x7 (7) number of logical processors at level = 0x28 (40) # cpuid -l 0xb -s 2 -1 CPU: --- level 2 --- level number = 0x2 (2) level type = invalid (0) bit width of level & previous levels = 0x0 (0) number of logical processors at level = 0x0 (0) # cpuid -l 0x1f -s 0 -1 CPU: V2 extended topology (0x1f): x2APIC ID of logical processor = 0x25 (37) --- level 0 --- level number = 0x0 (0) level type = thread (1) bit width of level & previous levels = 0x1 (1) number of logical processors at level = 0x2 (2) # cpuid -l 0x1f -s 1 -1 CPU: --- level 1 --- level number = 0x1 (1) level type = core (2) bit width of level & previous levels = 0x7 (7) number of logical processors at level = 0x28 (40) # cpuid -l 0x1f -s 2 -1 CPU: --- level 2 --- level number = 0x2 (2) level type = invalid (0) bit width of level & previous levels = 0x0 (0) number of logical processors at level = 0x0 (0) We tried a workaround having 0x1f output same as 0xb in case !x86_has_extended_topo(), instead of setting all zeros. This seems to work fine. Our understanding is
that current QEMU behaviour is not incorrect but still does the above mentioned workaround makes sense? And if we look it is the same as bare-metal so it should not be unreasonable. If so will be happy to send a patch for same. Thanks Manish Mishra |
[Prev in Thread] | Current Thread | [Next in Thread] |