qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] anyone seen or heard of large delays/stalls running qemu wi


From: Chris Friesen
Subject: [Qemu-devel] anyone seen or heard of large delays/stalls running qemu with kvm support?
Date: Wed, 23 Aug 2017 18:46:38 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0


Hi all,

I need to apologize up front, this is going to be a long email. Basically the executive summary is that we're seeing issues where a VM is apparently not making forward progress in the guest, while at the same time spinning in a busy loop on the host. I'm looking for any pointers/suggestions on what might be going on or how to narrow it down.

We're using qemu-kvm-ev-2.6.0 in the context of an OpenStack compute node running CentOS 7.

While running a test involving a cold migration of a VM from one host to another (basically shutting down the guest, moving all the files over to another host, then starting the guest back up) we have on a couple of cases had a VM get into a state where it was displaying "Guest has not initialized the display (yet)." on the console. Running strace or gdb on the host against the guest qemu process seems to allow it to make forward progress.

Here's what I've got:

1) The root disk seems to be fine. The root disk for the VM is iSCSI, and is accessible from the host. Starting up a new VM with the same root disk and mostly the same commandline works fine. (The serial port of the new VM was directed to stdio and the vnc option was removed.)

2) Investigating the apparently-hung instance it seems 2 of its 3 vCPUs are running a busy-loop:

   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
192577 root 20 0 3816972 38360 10916 R 99.9 0.0 75:15.17 CPU 1/KVM
192578 root -2 0 3816972 38360 10916 R 99.9 0.0 75:15.36 CPU 2/KVM

(Each of the vCPU threads is affined to a separate physical CPU.)


3) Trying to retrieve the cpu info via libvirt didn't return within 10 seconds:

compute-0:/home/wrsroot# start=$SECONDS; timeout 10 virsh qemu-monitor-command --hmp instance-0000006e info cpus || echo "timeout after $((SECONDS - start)) seconds"
timeout after 10 seconds


4) Running gdb on 192577 shows the following backtrace:
Thread 1 (process 192577):
#0 0x00007f8c596e4537 in ioctl () from /lib64/libc.so.6
#1 0x00007f8c70fe0234 in kvm_vcpu_ioctl ()
#2 0x00007f8c70fe02ed in kvm_cpu_exec ()
#3 0x00007f8c70fcee76 in qemu_kvm_cpu_thread_fn ()
#4 0x00007f8c599bedc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f8c596ed76d in clone () from /lib64/libc.so.6

5) Running gdb on 192578 gives the same backtrace.

6) Strace shows most of the time spent in ioctl...with 39K syscalls in 30 seconds across those two PIDs:

compute-0:/home/wrsroot# timeout 30 strace -c -q -T -p 192577 -p 192578
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.86 0.297753 9 32325 ioctl
  0.12 0.000354 0 6730 writev
  0.02 0.000059 0 384 60 futex
  0.00 0.000000 0 10 write
  0.00 0.000000 0 2 rt_sigpending
  0.00 0.000000 0 2 2 rt_sigtimedwait
------ ----------- ----------- --------- --------- ----------------
100.00 0.298166 39453 62 total


7) Running strace is enough to allow the VM to make forward progress, here are kernel logs from within the guest:

[ 2540.964492] INFO: rcu_preempt self-detected stall on CPU { 0} (t=2202996 jiffies g=-108 c=-109 q=1676)
[ 2540.984656] Task dump for CPU 0:
[ 2540.985911] systemd R running task 0 1 0 0x00000008
[ 2540.988028] ffff88003b660000 0000000084d8f1f4 ffff88003e003d90 
ffffffff810a8676
[ 2540.990869] 0000000000000000 ffffffff81a44d40 ffff88003e003da8 
ffffffff810abed9
[ 2540.993975] 0000000000000001 ffff88003e003dd8 ffffffff8110e1a0 
ffff88003e00ec40
[ 2540.996815] Call Trace:
[ 2540.997995] <IRQ> [<ffffffff810a8676>] sched_show_task+0xb6/0x120
[ 2540.999958] [<ffffffff810abed9>] dump_cpu_task+0x39/0x70
[ 2541.001559] [<ffffffff8110e1a0>] rcu_dump_cpu_stacks+0x90/0xd0
[ 2541.002425] [<ffffffff81111e34>] rcu_check_callbacks+0x434/0x7f0
[ 2541.003154] [<ffffffff810d6590>] ? tick_sched_handle.isra.13+0x60/0x60
[ 2541.003981] [<ffffffff8107f507>] update_process_times+0x47/0x80
[ 2541.004755] [<ffffffff810d6555>] tick_sched_handle.isra.13+0x25/0x60
[ 2541.005528] [<ffffffff810d65d1>] tick_sched_timer+0x41/0x70
[ 2541.006225] [<ffffffff81099c4c>] __hrtimer_run_queues+0xfc/0x320
[ 2541.006976] [<ffffffff8109a2b0>] hrtimer_interrupt+0xb0/0x1e0
[ 2541.007751] [<ffffffff810421b7>] local_apic_timer_interrupt+0x37/0x60
[ 2541.008541] [<ffffffff816d7b3f>] smp_apic_timer_interrupt+0x3f/0x60
[ 2541.009290] [<ffffffff816d621d>] apic_timer_interrupt+0x6d/0x80
[ 2541.009996] <EOI> [<ffffffff810ccd39>] ? get_monotonic_boottime+0xb9/0x100
[ 2541.010976] [<ffffffff81093cd1>] posix_get_boottime+0x11/0x20
[ 2541.011740] [<ffffffff810952e4>] SyS_clock_gettime+0x54/0xc0
[ 2541.012437] [<ffffffff816d5649>] system_call_fastpath+0x16/0x1b

[ 2602.140152] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-udevd:232]
[ 2602.140438] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[ksoftirqd/0:3]
[[ 2602.140455] Modules linked in: ata_piix(+) serio_raw virtio_blk(+) virtio_net floppy(+) libata sunrpc
[32m OK [0m] Started udev Coldplug all Devices.
[ 2602.140457] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.0-514.16.1.el7.24.tis.x86_64 #1 [ 2602.140458] Hardware name: Fedora Project OpenStack Nova, BIOS 1.9.1-5.el7_3.2.tis.1 04/01/2014
[ 2602.140472] task: ffff88003b661620 ti: ffff88003b6a0000 task.ti: 
ffff88003b6a0000
[ 2602.140478] RIP: 0010:[<ffffffff8110f765>] [<ffffffff8110f765>] rcu_process_gp_end+0x15/0x80
[ 2602.140488] RSP: 0018:ffff88003b6a3d70 EFLAGS: 00000202
[ 2602.140489] RAX: 0000000000000202 RBX: 00000000000120dd RCX: ffffffff81ae2f80
[ 2602.140490] RDX: ffff88003e000000 RSI: ffff88003e00ec40 RDI: ffffffff81a44d40
[ 2602.140490] RBP: ffff88003b6a3d90 R08: ffff88003f14f600 R09: 0000000180100005
[ 2602.140491] R10: ffffffff81a44d40 R11: 0000000000000000 R12: ffff88003b661688
[ 2602.140491] R13: ffffffff810c21c1 R14: ffff88003b6a3cf8 R15: 0000000180100005
[ 2602.140492] FS: 0000000000000000(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000
[ 2602.140493] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2602.140494] CR2: 00007fea58b37628 CR3: 000000003b068000 CR4: 00000000000006f0
[ 2602.140504] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2602.140505] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2602.140505] Stack:
[ 2602.140516] ffffffff81a44d40 0000000000000009 ffff88003cc02300 
ffffffff819f7108
[ 2602.140517] ffff88003b6a3df0 ffffffff8111001e ffff88003aa52e30 
0000000000000000
[ 2602.140518] ffff88003e00ef88 ffff88003b661620 ffff88003b6a3fd8 
0000000000000009
[ 2602.140524] Call Trace:
[ 2602.140534] [<ffffffff8111001e>] rcu_process_callbacks+0x9e/0x5c0
[ 2602.140545] [<ffffffff81075ba0>] __do_softirq+0x100/0x300
[ 2602.140546] [<ffffffff81075dd8>] run_ksoftirqd+0x38/0x50
[ 2602.140555] [<ffffffff8109e62f>] smpboot_thread_fn+0x12f/0x240
[ 2602.140557] [<ffffffff8109e500>] ? lg_double_unlock+0xb0/0xb0
[ 2602.140559] [<ffffffff8109596f>] kthread+0xcf/0xe0
[ 2602.140569] [<ffffffff810958a0>] ? kthread_create_on_node+0x140/0x140
[ 2602.140572] [<ffffffff816d5598>] ret_from_fork+0x58/0x90
[ 2602.140574] [<ffffffff810958a0>] ? kthread_create_on_node+0x140/0x140
[ 2602.140595] Code: 00 31 ff e8 fe ed ff ff 48 89 df 57 9d 0f 1f 44 00 00 5b 5d c3 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 f4 53 9c 58 <0f> 1f 44 00 00 48 89 c3 fa 66 0f 1f 44 00 00 4c 8b 6e 18 49 8b
[ 2602.146398] FDC 0 is a S82078B
[ 2602.290420] Modules linked in: ata_piix(+) serio_raw virtio_blk(+) virtio_net floppy libata sunrpc [ 2602.290422] CPU: 2 PID: 232 Comm: systemd-udevd Tainted: G L ------------ 3.10.0-514.16.1.el7.24.tis.x86_64 #1 [ 2602.290422] Hardware name: Fedora Project OpenStack Nova, BIOS 1.9.1-5.el7_3.2.tis.1 04/01/2014
[ 2602.290423] task: ffff88003faeeea0 ti: ffff88003fad4000 task.ti: 
ffff88003fad4000
[ 2602.290428] RIP: 0010:[<ffffffff810dcd2a>] [<ffffffff810dcd2a>] smp_call_function_many+0x20a/0x270
[ 2602.290428] RSP: 0018:ffff88003fad7740 EFLAGS: 00000202
[ 2602.290429] RAX: 0000000000000000 RBX: ffff88003e116280 RCX: ffff88003e018fd8
[ 2602.290430] RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
[ 2602.290430] RBP: ffff88003fad7778 R08: ffff88003e116288 R09: 0000000000000004
[ 2602.290431] R10: ffff88003e116288 R11: 0000000000000000 R12: ffffffff81ae2f80
[ 2602.290431] R13: 000c00003e341958 R14: 00000000000000fc R15: 0000000000000292
[ 2602.290432] FS: 00007fb91a4648c0(0000) GS:ffff88003e100000(0000) knlGS:0000000000000000
[ 2602.290433] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2602.290434] CR2: 00007febbb741000 CR3: 000000003f921000 CR4: 00000000000006e0
[ 2602.290436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2602.290436] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2602.290437] Stack:
[ 2602.290438] 0000000100000001 0000000000016240 ffffffff81e1aa00 
0000000000000000
[ 2602.290439] ffffffff81154e30 0000000000000002 0000000000000286 
ffff88003fad77a8
[ 2602.290440] ffffffff810dcfaa 0000000000000002 0000000000000100 
0000000000000100
[ 2602.290441] Call Trace:
[ 2602.290444] [<ffffffff81154e30>] ? drain_pages+0xb0/0xb0
[ 2602.290446] [<ffffffff810dcfaa>] on_each_cpu_mask+0x3a/0xa0
[ 2602.290448] [<ffffffff811521d5>] drain_all_pages+0xb5/0xc0
[ 2602.290450] [<ffffffff811b9d0f>] set_migratetype_isolate+0xaf/0xf0
[ 2602.290452] [<ffffffff811b9e90>] start_isolate_page_range+0x70/0xf0
[ 2602.290453] [<ffffffff81155632>] alloc_contig_range+0xe2/0x320
[ 2602.290458] [<ffffffff814130af>] dma_alloc_from_contiguous+0xaf/0x200
[ 2602.290461] [<ffffffff81021238>] dma_generic_alloc_coherent+0x88/0x1c0
[ 2602.290463] [<ffffffff8141bd42>] dmam_alloc_coherent+0xb2/0x140
[ 2602.290471] [<ffffffffa007d3c4>] ata_bmdma_port_start+0x44/0x60 [libata]
[ 2602.290474] [<ffffffffa00f034b>] piix_port_start+0x1b/0x20 [ata_piix]
[ 2602.290478] [<ffffffffa0069185>] ata_host_start.part.32+0xf5/0x1e0 [libata]
[ 2602.290483] [<ffffffffa007e6c0>] ? ata_bmdma_port_intr+0x120/0x120 [libata]
[ 2602.290486] [<ffffffffa0069289>] ata_host_start+0x19/0x20 [libata]
[ 2602.290491] [<ffffffffa007cfe5>] ata_pci_sff_activate_host+0x45/0x250 
[libata]
[ 2602.290493] [<ffffffff8134a0ac>] ? pci_write_config_byte+0x1c/0x20
[ 2602.290496] [<ffffffff81352226>] ? pcibios_set_master+0x76/0xa0
[ 2602.290498] [<ffffffffa00f0e9e>] piix_init_one+0x57e/0x975 [ata_piix]
[ 2602.290500] [<ffffffff81353935>] local_pci_probe+0x45/0xa0
[ 2602.290501] [<ffffffff81354c75>] ? pci_match_device+0xe5/0x120
[ 2602.290503] [<ffffffff81354df1>] pci_device_probe+0x101/0x150
[ 2602.290505] [<ffffffff8140bac5>] driver_probe_device+0x145/0x3c0
[ 2602.290507] [<ffffffff8140be13>] __driver_attach+0x93/0xa0
[ 2602.290509] [<ffffffff8140bd80>] ? __device_attach+0x40/0x40
[ 2602.290510] [<ffffffff81409723>] bus_for_each_dev+0x73/0xc0
[ 2602.290511] [<ffffffff8140b40e>] driver_attach+0x1e/0x20
[ 2602.290513] [<ffffffff8140af60>] bus_add_driver+0x200/0x2d0
[ 2602.290514] [<ffffffff8140c494>] driver_register+0x64/0xf0
[ 2602.290516] [<ffffffff813548d5>] __pci_register_driver+0xa5/0xc0
[ 2602.290517] [<ffffffffa00f8000>] ? 0xffffffffa00f7fff
[ 2602.290519] [<ffffffffa00f801e>] piix_init+0x1e/0x2e [ata_piix]
[ 2602.290521] [<ffffffff810003a8>] do_one_initcall+0xb8/0x230
[ 2602.290523] [<ffffffff810e3201>] load_module+0x2171/0x2810
[ 2602.290525] [<ffffffff81337380>] ? ddebug_proc_write+0xf0/0xf0
[ 2602.290528] [<ffffffff810df3e3>] ? copy_module_from_fd.isra.46+0x53/0x150
[ 2602.290529] [<ffffffff810e3a56>] SyS_finit_module+0xa6/0xd0
[ 2602.290531] [<ffffffff816d5649>] system_call_fastpath+0x16/0x1b


8) Running strace in a loop doesn't reliably kick qemu-kvm to run some more guest code. Found another method that works better:

cpu_kvm_busy=$(top -b -n1 -H -o+%CPU -p $(pgrep qemu-kvm) | awk '/KVM/ && $9>50 {print $1;}'); while true; do for tid in $cpu_kvm_busy; do gdb --batch --quiet -ex 'bt' -ex 'quit' /usr/libexec/qemu-kvm $tid; done; done

This basically just finds qemu threads that are using more than 50% CPU, attaches gdb to them, prints stack trace, and exits.

With this running the instance can boot up to the login prompt on the vnc console and access the rootfs.


Thanks for reading this far...if you have any pointers/suggestions I'd love to hear them.

Chris



reply via email to

[Prev in Thread] Current Thread [Next in Thread]