qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of v


From: 李春奇
Subject: Re: [Qemu-devel] [qemu-devel] Bug Report: VM crashed for some kinds of vCPU in nested virtualization
Date: Tue, 16 Apr 2013 11:49:46 +0800

I changed to the latest version of kvm kernel but the bug also occured.

On the startup of L1 VM on the host, the host kern.log will output:
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458090] kvm [2808]: vcpu0 unhandled rdmsr: 0x345
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458166] kvm_set_msr_common: 22 callbacks suppressed
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458169] kvm [2808]: vcpu0 unhandled wrmsr: 0x40 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458176] kvm [2808]: vcpu0 unhandled wrmsr: 0x60 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458182] kvm [2808]: vcpu0 unhandled wrmsr: 0x41 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458188] kvm [2808]: vcpu0 unhandled wrmsr: 0x61 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458194] kvm [2808]: vcpu0 unhandled wrmsr: 0x42 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458200] kvm [2808]: vcpu0 unhandled wrmsr: 0x62 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458206] kvm [2808]: vcpu0 unhandled wrmsr: 0x43 data 0
Apr 16 11:28:22 Blade1-02 kernel: [ 4908.458211] kvm [2808]: vcpu0 unhandled wrmsr: 0x63 data 0
Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471014] kvm [2808]: vcpu1 unhandled wrmsr: 0x40 data 0
Apr 16 11:28:23 Blade1-02 kernel: [ 4908.471024] kvm [2808]: vcpu1 unhandled wrmsr: 0x60 data 0

When L1 VM starts and crashes, its kern.log will output:
Apr 16 11:28:55 kvm1 kernel: [   33.590101] device tap0 entered promiscuous mode
Apr 16 11:28:55 kvm1 kernel: [   33.590140] br0: port 2(tap0) entered forwarding state
Apr 16 11:28:55 kvm1 kernel: [   33.590146] br0: port 2(tap0) entered forwarding state
Apr 16 11:29:04 kvm1 kernel: [   42.592103] br0: port 2(tap0) entered forwarding state
Apr 16 11:29:19 kvm1 kernel: [   57.752731] kvm [1673]: vcpu0 unhandled rdmsr: 0x345
Apr 16 11:29:19 kvm1 kernel: [   57.797261] kvm [1673]: vcpu0 unhandled wrmsr: 0x40 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797315] kvm [1673]: vcpu0 unhandled wrmsr: 0x60 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797366] kvm [1673]: vcpu0 unhandled wrmsr: 0x41 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797416] kvm [1673]: vcpu0 unhandled wrmsr: 0x61 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797466] kvm [1673]: vcpu0 unhandled wrmsr: 0x42 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797516] kvm [1673]: vcpu0 unhandled wrmsr: 0x62 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797566] kvm [1673]: vcpu0 unhandled wrmsr: 0x43 data 0
Apr 16 11:29:19 kvm1 kernel: [   57.797616] kvm [1673]: vcpu0 unhandled wrmsr: 0x63 data 0

The host will output simultaneously:
Apr 16 11:29:20 Blade1-02 kernel: [ 4966.314742] nested_vmx_run: VMCS MSR_{LOAD,STORE} unsupported

And the callback trace displayed on the console is the same as the previous mail.

Besides, the L1 and L2 guest may sometimes crash and output nothing, while sometimes it will output as above.


So this indicates that the msr controls may fail for core2duo CPU emulator.


For Jan,
I have traced the code of qemu and KVM and found the relevant code of errno "KVM: entry failed, hardware error 0x7". The relevant code is in kernel arch/x86/kvm/vmx.c, function vmx_handle_exit():

if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu->run->fail_entry.hardware_entry_failure_reason
= exit_reason;
return 0;
}

if (unlikely(vmx->fail)) {
vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu->run->fail_entry.hardware_entry_failure_reason
= vmcs_read32(VM_INSTRUCTION_ERROR);
return 0;
}

The entry failed hardware error may be caused from these two points, both are caused by VMENTRY failed. Because macro VMX_EXIT_REASONS_FAILED_VMENTRY is 0x80000000 and the output errno is 0x7, so this error is caused by the second branch. I'm not very clear what the result of vmcs_read32(VM_INSTRUCTION_ERROR) refers to.

Thanks,
Arthur


On Mon, Apr 15, 2013 at 3:43 PM, Jan Kiszka <address@hidden> wrote:
On 2013-04-15 08:24, 李春奇 <Arthur Chunqi Li> wrote:
> Hi all,
> In a nested virtualization environment of qemu+KVM, some emulated CPU (such
> as core2duo) may cause L2 guest crash after booting for a while. Here's my
> configuration:
>
> Host:
> Linux 3.5.7

You should better use latest version from kvm.git [1], branch "next".
Otherwise, you risk re-triggering bugs that were fixed in the meantime.

> Qemu is the latest version from git repository.
> Emulated CPU : core2duo
>
> L1 guest:
> Linux 3.5.7
> Qemu is the latest version from git
> Emulated CPU : core2duo
>
> L2 guest:
> Crash at some specific point after running for sometime.
>
>
> Here's the callback trace:
>
> qemu-system-x86_64 -net nic,vlan=0,macaddr=00:26:b9:fa:fe:31 -net
> tap,vlan=0 -vnc :1 -hda vm1.1.img -m 512 -machine pc,accel=kvm -cpu
> core2duo -cdrom ubuntu-12.04.2-server-amd64.iso
> TUNSETIFF: Device or resource busy
> qemu-system-x86_64: pci_add_option_rom: failed to find romfile
> "efi-e1000.rom"
> KVM: entry failed, hardware error 0x7
                                    ^^^
As an exercise, you could try to track down what this number means.
Hint: there will be two possibilities (unfortunately).

> RAX=000000000000000f RBX=ffff88001f60c740 RCX=000000000000038f
> RDX=0000000000000007
> RSI=000000000000000f RDI=000000000000038f RBP=ffff88001e6ffaf0
> RSP=ffff88001e6ffaf0
> R8 =000000070000000f R9 =0000000000000000 R10=0000000000000000
> R11=0000000000000000
> R12=0000000000000001 R13=0000000000000001 R14=0000000000000000
> R15=ffff88001f617384
> RIP=ffffffff8103fe1a RFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 0000000000000000 000fffff 00000000
> CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0000 0000000000000000 000fffff 00000000
> FS =0000 0000000000000000 000fffff 00000000
> GS =0000 ffff88001f600000 000fffff 00000000
> LDT=0000 0000000000000000 000fffff 00000000
> TR =0040 ffff88001f611580 00002087 00008b00 DPL=0 TSS64-busy
> GDT=     ffff88001f604000 0000007f
> IDT=     ffffffff81dd6000 00000fff
> CR0=8005003b CR2=00000000ffffffff CR3=0000000001c0b000 CR4=000007f0
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000d01
> Code=20 89 f9 48 09 c8 5d c3 66 90 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 5d
> c3 66 90 55 89 f9 48 89 e5 0f 33 89 c7 48 89 d0 48 c1 e0 20 89 f9 48 09 c8
> 5d c3
>
>
> This bug also appears in Westmere, SandyBridge and Haswell. But Nehalem,
> Penryn and Conroe runs OK.
>
> Is this problem really a bug or some mistakes in configuration?

A bug, most probably. If you are able to reproduce using latest KVM, we
would have to look into details.

Jan

PS: KVM related error reports of QEMU should also go to the KVM list.
CC'ing it.

[1] https://git.kernel.org/cgit/virt/kvm/kvm.git/




--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China

reply via email to

[Prev in Thread] Current Thread [Next in Thread]