qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-discuss] How to debug unresponsive guest?


From: Chris Dunlop
Subject: [Qemu-discuss] How to debug unresponsive guest?
Date: Fri, 4 Dec 2015 07:47:16 +1100
User-agent: Mutt/1.5.23 (2014-03-12)

Hi,

I have a guest VM which becomes unresponsive after period, sometimes a few
hours, sometimes up to a day or so.

What can I do at the qemu level to start working out where things are
going wrong?

guest: linux-4.3
host: linux-3.18.21, qemu-2.1, ceph-0.94.5

Prior to the problem the guest is running 30 or so ryncs inbound from the
virtio network, writing about 20MB/s the one multi-TB xfs fs on a virtio
disk backed by ceph rbd via librbd.

"Unresponsive" means the guest stops responding to the network (ssh,
pings) and other network traffic (i.e. the rsyncs) stops, the spice
console shows a login message but doesn't respond to keystrokes, the
serial console (virsh console vm102) also doesn't respond. The only way out
so far has been to kill the qemu process (virsh destroy). After rebooting
there's nothing in the guest's logs to indicate what might have happened.

The host is running around 20 other VMs with a mix of various windows and
linux versions on the same ceph infrastructure without any other problems.

Whilst the problem is occuring the qemu process builds up cpu time slowly,
e.g. ps shows it used 3 seconds of cpu time in 60 seconds elapsed. 'perf
top -p pid' shows:

Samples: 18K of event 'cycles', Event count (approx.): 1198214862
Overhead  Shared Object             Symbol
  16.48%  [kernel]                  [k] __lock_acquire.isra.32
   5.81%  [kernel]                  [k] lock_release
   4.05%  [kernel]                  [k] lock_acquired
   3.66%  [kernel]                  [k] native_sched_clock
   3.40%  [kernel]                  [k] kvm_arch_vcpu_ioctl_run
   3.35%  [kernel]                  [k] lock_acquire
   2.48%  [kernel]                  [k] lock_release_holdtime.part.23
   2.31%  [kernel]                  [k] vmx_vcpu_run
   2.24%  [kernel]                  [k] __fget
   1.33%  [kernel]                  [k] do_sys_poll
   1.29%  [kernel]                  [k] _raw_spin_unlock_irqrestore
   1.19%  [kernel]                  [k] generic_exec_single
   1.09%  [kernel]                  [k] _raw_spin_lock_irqsave
   0.97%  [kernel]                  [k] vmx_save_host_state
   0.90%  [kernel]                  [k] timerqueue_add
   0.89%  [kernel]                  [k] __srcu_read_lock
   0.88%  [kernel]                  [k] do_raw_spin_trylock
   0.71%  [kernel]                  [k] vmcs_writel
   0.68%  [kernel]                  [k] __hrtimer_start_range_ns
   0.64%  [kernel]                  [k] set_spte
   0.63%  [kernel]                  [k] add_atomic_switch_msr
   0.60%  [kernel]                  [k] fput
   0.58%  [kernel]                  [k] kvm_read_guest_cached
   0.57%  [kernel]                  [k] copy_user_generic_string
   0.53%  [kernel]                  [k] eventfd_poll
   0.52%  qemu-system-x86_64        [.] 0x00000000003759ac
   0.49%  [kernel]                  [k] vmx_handle_exit
   0.49%  [kernel]                  [k] __apic_accept_irq
   0.49%  [kernel]                  [k] dequeue_task_fair
   0.49%  [kernel]                  [k] kvm_arch_vcpu_runnable
   0.48%  [kernel]                  [k] kvm_set_shared_msr
   0.47%  [kernel]                  [k] local_clock
   0.46%  [kernel]                  [k] vmx_vcpu_load
   0.45%  [kernel]                  [k] find_busiest_group
   0.44%  [kernel]                  [k] __perf_event_task_sched_out
   0.44%  [kernel]                  [k] __schedule
   0.43%  [kernel]                  [k] prepare_to_wait
   0.43%  [kernel]                  [k] update_curr
   0.41%  [kernel]                  [k] do_raw_spin_unlock
   0.41%  [kernel]                  [k] tcp_poll
   0.40%  [kernel]                  [k] kvm_arch_vcpu_load
   0.39%  [kernel]                  [k] kvm_lapic_sync_to_vapic
   0.39%  [kernel]                  [k] apic_update_ppr
   0.38%  [kernel]                  [k] vmcs_load
   0.38%  [kernel]                  [k] vmx_interrupt_allowed
   0.37%  [kernel]                  [k] kvm_check_async_pf_completion
   0.37%  [kernel]                  [k] remove_wait_queue
   0.37%  [kernel]                  [k] kvm_apic_has_interrupt
   0.35%  [kernel]                  [k] pick_next_task_fair
   0.35%  [kernel]                  [k] change_protection
   0.34%  [kernel]                  [k] vmx_sched_in
   0.33%  [kernel]                  [k] vmx_inject_irq
   0.32%  [kernel]                  [k] gup_pte_range
   0.32%  [kernel]                  [k] match_held_lock
   0.31%  [kernel]                  [k] kvm_get_apic_interrupt
   0.31%  [kernel]                  [k] apic_reg_write
   0.31%  [kernel]                  [k] handle_mm_fault
   0.30%  [kernel]                  [k] apic_has_pending_timer
   0.30%  [kernel]                  [k] apic_set_eoi
   0.30%  [kernel]                  [k] kvm_write_guest_cached
   0.30%  [kernel]                  [k] cpuacct_charge
   0.30%  [kernel]                  [k] copy_page_rep
   0.29%  [kernel]                  [k] kvm_vcpu_block
   0.29%  [kernel]                  [k] __pollwait
   0.29%  [kernel]                  [k] kvm_set_msr_common
   0.28%  [kernel]                  [k] kvm_ioapic_handles_vector
   0.28%  [kernel]                  [k] mmu_set_spte
   0.28%  [kernel]                  [k] kvm_sched_in
   0.28%  [kernel]                  [k] vmx_set_msr
   0.27%  [kernel]                  [k] sock_poll
   0.27%  [kernel]                  [k] update_cr8_intercept
   0.27%  [kernel]                  [k] update_cfs_shares
   0.26%  librt-2.13.so             [.] clock_gettime
   0.26%  [kernel]                  [k] kvm_apic_accept_pic_intr
   0.25%  [kernel]                  [k] follow_page_pte
   0.25%  libpthread-2.13.so        [.] pthread_mutex_lock
   0.24%  [kernel]                  [k] account_entity_dequeue
   0.24%  libglib-2.0.so.0.3200.4   [.] g_main_context_check
   0.23%  [kernel]                  [k] ktime_get
   0.23%  [kernel]                  [k] vmx_flush_tlb
   0.22%  [kernel]                  [k] intel_pmu_disable_all
   0.22%  [kernel]                  [k] clear_atomic_switch_msr

The guest is managed by libvirt and started like:

qemu-system-x86_64 -enable-kvm -name vm102 -S -machine 
pc-i440fx-2.1,accel=kvm,usb=off -cpu Westmere -m 8192 -realtime mlock=off -smp 
2,sockets=2,cores=1,threads=1 -uuid xxxxxxxx -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm102.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global 
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive 
if=none,id=drive-ide0-0-0,readonly=on,format=raw -device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive 
file=rbd:rbd-sas/vm102%root:id=u102:key=xxxxxxxxx==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive 
file=rbd:rbd-sas/vm102%swap:id=u102:key=xxxxxxxxx==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk1,format=raw
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1
 -drive 
file=rbd:rbd-sas/vm102%tmp:id=u102:key=xxxxxxxxx==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk2,format=raw
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2
 -drive 
file=rbd:rbd/vm102%var:id=u102:key=xxxxxxxxx==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk3,format=raw
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,id=virtio-disk3
 -drive 
file=rbd:rbd-x3/vm102%data:id=u102:key=xxxxxxxxx==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk4,format=raw
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,id=virtio-disk4
 -netdev tap,fd=29,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:8e:3c:37,bus=pci.0,addr=0x3 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/vm102.org.qemu.guest_agent.0,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
 -chardev pty,id=charconsole0 -device 
virtconsole,chardev=charconsole0,id=console0 -device usb-tablet,id=input0 
-spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 
-chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1 -chardev 
spicevmc,id=charredir2,name=usbredir -device 
usb-redir,chardev=charredir2,id=redir2 -chardev 
spicevmc,id=charredir3,name=usbredir -device 
usb-redir,chardev=charredir3,id=redir3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

At this point I don't know if this is a guest kernel problem, a qemu
problem, a host kernel problem, or ceph problem.

What should I be looking at to find out what's actually going wrong?

Cheers,

Chris



reply via email to

[Prev in Thread] Current Thread [Next in Thread]