qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick


From: Chris Webb
Subject: [Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick
Date: Mon, 8 Sep 2014 14:28:07 +0100

I've reported this bug before, which reliably crashes a guest kernel shortly
after boot, but have just reconfirmed that it is still present with Linux
3.16.2 guest and host kernels and Qemu 2.1.

Running a 3.16.2 x86-64 SMP guest kernel on qemu-2.1, with kvm enabled and
-cpu host on a 3.16.2 AMD Opteron host, I'm seeing a reliable kernel panic
from the guest shortly after boot. I think is happening in kvm_unlock_kick()
in the paravirt_ops code:

divide error: 0000 [#1] PREEMPT SMP 
Modules linked in:
CPU: 0 PID: 743 Comm: syslogd Not tainted 3.16.2-guest #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
task: ffff88007c972580 ti: ffff88007cb7c000 task.ti: ffff88007cb7c000
RIP: 0010:[<ffffffff81037fe2>]  [<ffffffff81037fe2>] kvm_unlock_kick+0x72/0x80
RSP: 0000:ffff88007fc03ec8  EFLAGS: 00010046
RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000003
RDX: 0000000000000003 RSI: ffffffff81a466a0 RDI: 0000000000000000
RBP: ffffffff81a466a0 R08: ffffffff81b98940 R09: 0000000000000246
R10: 0000000000000400 R11: 0000000000000000 R12: 00000000000000ea
R13: 0000000000000009 R14: 0000000000000002 R15: ffff88007fc0d300
FS:  00007f2a6473e700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000004a8240 CR3: 000000007ac75000 CR4: 00000000000406f0
Stack:
 ffffffff81a46400 0000000000000246 0000000000000001 ffffffff8168979d
 0000000000000282 ffffffff81110d97 0000000000000007 ffff88007cb7ffd8
 ffff88007c972580 000000004b0782e8 0000000000000002 ffffffff81a0b0c8
Call Trace:
 <IRQ> 
 [<ffffffff8168979d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [<ffffffff81110d97>] ? rcu_process_callbacks+0x337/0x4f0
 [<ffffffff810cde2d>] ? __do_softirq+0xfd/0x210
 [<ffffffff810ce06e>] ? irq_exit+0x7e/0xa0
 [<ffffffff8103063b>] ? smp_apic_timer_interrupt+0x3b/0x50
 [<ffffffff8168b04d>] ? apic_timer_interrupt+0x6d/0x80
 <EOI> 
 [<ffffffff8114180b>] ? filemap_map_pages+0x17b/0x240
 [<ffffffff811418c0>] ? filemap_map_pages+0x230/0x240
 [<ffffffff811679e2>] ? do_read_fault.isra.70+0x2a2/0x320
 [<ffffffff811696cc>] ? handle_mm_fault+0x37c/0xd00
 [<ffffffff8103bb45>] ? __do_page_fault+0x185/0x4c0
 [<ffffffff8168b958>] ? async_page_fault+0x28/0x30
 [<ffffffff813b9610>] ? __put_user_4+0x20/0x30
 [<ffffffff8168b958>] ? async_page_fault+0x28/0x30
Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 
c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 <0f> 01 c1 0f 1f 00 5b 
5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 
RIP  [<ffffffff81037fe2>] kvm_unlock_kick+0x72/0x80
 RSP <ffff88007fc03ec8>
---[ end trace be08885ac2c94c6a ]---
Kernel panic - not syncing: Fatal exception in interrupt

My host kernel config is http://cdw.me.uk/tmp/host-config.txt and the guest
config is http://cdw.me.uk/tmp/guest-config.txt with qemu command line:

 qemu-system-x86 -enable-kvm -cpu host -machine q35 -m 2048 -name $1 \
   -smp sockets=1,cores=4 -pidfile /run/$1.pid -runas nobody \
   -serial stdio -vga none -vnc none -kernel /boot/vmlinuz-guest \
   -append "console=ttyS0 root=/dev/vda" \
   -drive file=/dev/guest/$1,cache=none,format=raw,if=virtio \
   -device virtio-rng-pci \
   -device virtio-net-pci,netdev=nic,mac=$(< /sys/class/net/$1/address) \
   -netdev tap,id=nic,fd=3 3<>/dev/tap$(< /sys/class/net/$1/ifindex)

I can stop this crash by disabling CONFIG_PARAVIRT_SPINLOCKS in my guest
kernel, running with -cpu qemu64 instead of -cpu host, or running with -smp 1
instead of -smp 4. (Removing/changing the -machine q35 makes no difference.)

/proc/cpuinfo on the host has 8 of these:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD Opteron(tm) Processor 6328
stepping        : 0
microcode       : 0x600081c
cpu MHz         : 3200.000
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 32
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf 
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean 
flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips        : 6399.70
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

and on the guest, has 4 of these:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD Opteron(tm) Processor 6328
stepping        : 0
microcode       : 0x1000065
cpu MHz         : 3199.852
cache size      : 2048 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm 
rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic 
popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm 
sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm arat npt nrip_save bmi1
bogomips        : 6399.70
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Full dumps are at http://cdw.me.uk/tmp/host-cpuinfo.txt and
http://cdw.me.uk/tmp/guest-cpuinfo.txt respectively. I've also put the host
and guest dmesg output shortly after booting at

  http://cdw.me.uk/tmp/host-dmesg.txt
  http://cdw.me.uk/tmp/guest-dmesg.txt

I tried enabling CONFIG_PARAVIRT_DEBUG, but no extra information was
reported. These kernels are built with frame pointers and -O2 rather than
-Os to try to maximise useful debug info.

Any help would be extremely gratefully received: I'm at a complete loss as
to what to do next to debug this so I can start using less ancient kernel
and qemu versions!

Best wishes,

Chris.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]