qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Debugging io deadlock


From: BALATON Zoltan
Subject: Re: [Qemu-devel] Debugging io deadlock
Date: Tue, 5 Dec 2017 19:35:36 +0100 (CET)
User-agent: Alpine 2.21 (BSF 202 2017-01-01)

On Tue, 5 Dec 2017, Stefan Hajnoczi wrote:
On Mon, Dec 04, 2017 at 08:22:48PM +0100, BALATON Zoltan wrote:
I'm seeing a possible deadlock that I don't know how to debug. Any hint on
how to find the cause or what should be checked further to identify the
reason why this is happening and how to fix it is greatly appreciated.

Here are the state of threads:

(gdb) info thr
  Id   Target Id         Frame
* 4    Thread 0x7fffba76c700 (LWP 3445) "qemu-system-ppc" 0x0000555555cbec04 in 
worker_thread (opaque=0x7fffe40c9000) at util/thread-pool.c:92
  3    Thread 0x7fffe8829700 (LWP 3443) "qemu-system-ppc" 0x00007ffff78d267f in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7ffff111b700 (LWP 3442) "qemu-system-ppc" 0x00007ffff42cad29 in 
syscall () from /lib64/libc.so.6
  1    Thread 0x7ffff7fc7b00 (LWP 3441) "qemu-system-ppc" 0x00007ffff42c4e31 in 
ppoll () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff78d4830 in sem_timedwait () from /lib64/libpthread.so.0
#1  0x0000555555cc572e in qemu_sem_timedwait (sem=0x7fffe40c9078, ms=10000) at 
util/qemu-thread-posix.c:289
#2  0x0000555555cbec04 in worker_thread (opaque=0x7fffe40c9000) at 
util/thread-pool.c:92
#3  0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff42d062d in clone () from /lib64/libc.so.6
(gdb) thr 3
[Switching to thread 3 (Thread 0x7fffe8829700 (LWP 3443))]
#0  0x00007ffff78d267f in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff78d267f in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x0000555555cc5458 in qemu_cond_wait (cond=0x555556b47b90, mutex=0x5555565b5220 
<qemu_global_mutex>) at util/qemu-thread-posix.c:161
#2  0x00005555557e6690 in qemu_tcg_wait_io_event (cpu=0x7ffff7e20010) at 
cpus.c:1084
#3  0x00005555557e6f00 in qemu_tcg_rr_cpu_thread_fn (arg=0x7ffff7e20010) at 
cpus.c:1396
#4  0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0
#5  0x00007ffff42d062d in clone () from /lib64/libc.so.6
(gdb) thr 2
[Switching to thread 2 (Thread 0x7ffff111b700 (LWP 3442))]
#0  0x00007ffff42cad29 in syscall () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff42cad29 in syscall () from /lib64/libc.so.6
#1  0x0000555555cc58a7 in qemu_futex_wait (f=0x555556a01134 
<rcu_call_ready_event>, val=4294967295) at include/qemu/futex.h:29
#2  0x0000555555cc5a74 in qemu_event_wait (ev=0x555556a01134 
<rcu_call_ready_event>) at util/qemu-thread-posix.c:442
#3  0x0000555555cdd92c in call_rcu_thread (opaque=0x0) at util/rcu.c:249
#4  0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0
#5  0x00007ffff42d062d in clone () from /lib64/libc.so.6
(gdb) thr 1
[Switching to thread 1 (Thread 0x7ffff7fc7b00 (LWP 3441))]
#0  0x00007ffff42c4e31 in ppoll () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff42c4e31 in ppoll () from /lib64/libc.so.6
#1  0x0000555555cbfe86 in qemu_poll_ns (fds=0x555557c17620, nfds=5, 
timeout=29806320) at util/qemu-timer.c:334
#2  0x0000555555cc0eab in os_host_main_loop_wait (timeout=29806320) at 
util/main-loop.c:255
#3  0x0000555555cc0f7d in main_loop_wait (nonblocking=0) at util/main-loop.c:515
#4  0x000055555599e2b3 in main_loop () at vl.c:1995
#5  0x00005555559a6353 in main (argc=21, argv=0x7fffffffdef8, 
envp=0x7fffffffdfa8) at vl.c:4911

Then if I wait a little, thread 4 exits due to sem_timedwait returning -1
with errno=ETIMEDOUT leaving other threads waiting for something to happen
but this is apparently a deadlock as it will be stuck here (thread 1-3 are
still as above). Any idea why this could happen and how to debug it furhter?

Are you using the latest qemu.git/master?

Yes I'm on git master. Even 044897ef4 "target/ppc: Fix system lockups caused by interrupt_request state corruption" does not help which looked like it may be related but the hang happens with this as well the same way.

Commit ef6dada8b44e1e7c4bec5c1115903af9af415b50 ("util/async: use
atomic_mb_set in qemu_bh_cancel") fixes hangs that occur with the thread
pool (Thread 4 in your example).  I'm not sure if this applies to your
hang though...

It looks like Thread 3 isn't running guest code because the cpu wants to
sleep (is it halted?).

Looks like it's halted waiting for something (maybe io but I see no other thread that could wake this so I'm not sure what's happening):

#2  0x00005555557e66c0 in qemu_tcg_wait_io_event (cpu=0x7ffff7e20010) at 
cpus.c:1084
1084            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
(gdb) p *cpu
$4 = {parent_obj = {parent_obj = {class = 0x555556a954f0, free = 0x7ffff500c6e0 
<g_free>, properties = 0x555556b1a860, ref = 2,
  parent = 0x555556b19f00}, id = 0x0, canonical_path = 0x555556b48e40 
"/machine/unattached/device[0]", realized = true,
  pending_deleted_event = false, opts = 0x0, hotplugged = 0, parent_bus = 0x0, 
gpios = {lh_first = 0x0}, child_bus = {
  lh_first = 0x0}, num_child_bus = 0, instance_id_alias = -1, 
alias_required_for_version = 0}, nr_cores = 1, nr_threads = 1,
  thread = 0x555556b48b70, thread_id = 3554, running = false, has_waiter = 
false, halt_cond = 0x555556b48b90,
  thread_kicked = false, created = true, stop = false, stopped = false, unplug 
= false, crash_occurred = false,
  exit_request = false, cflags_next_tb = 4294967295, interrupt_request = 4, 
singlestep_enabled = 0, icount_budget = 0,
  icount_extra = 0, jmp_env = {{__jmpbuf = {140737094260480, 
-3807285253765841529, 140737488344321, 4096, 140737094260480, 0,
  -3807285253770035833, -7026543154144068217}, __mask_was_saved = 0, __saved_mask = 
{__val = {0 <repeats 16 times>}}}},
  work_mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, 
__nusers = 0, __kind = 0, __spins = 0, __elision = 0,
  __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
__align = 0}, initialized = true},
  queued_work_first = 0x0, queued_work_last = 0x0, cpu_ases = 0x555556b48ab0, 
num_ases = 1, as = 0x555556b48a30,
  memory = 0x555556b19b70, env_ptr = 0x7ffff7e282b0, tb_jmp_cache = 8< trimmed for 
brevity >8, gdb_regs = 0x555556b48a00,
  gdb_num_regs = 104, gdb_num_g_regs = 71, node = {tqe_next = 0x0, tqe_prev = 
0x555556450020 <cpus>}, breakpoints = {
  tqh_first = 0x0, tqh_last = 0x7ffff7e28220}, watchpoints = {tqh_first = 0x0, 
tqh_last = 0x7ffff7e28230}, watchpoint_hit = 0x0,
  opaque = 0x0, mem_io_pc = 140737109809995, mem_io_vaddr = 3774989057, kvm_fd 
= 0, kvm_state = 0x0, kvm_run = 0x0,
  trace_dstate_delayed = {0}, trace_dstate = {0}, cpu_index = 0, halted = 1, 
can_do_io = 1, exception_index = -1,
  vcpu_dirty = false, throttle_thread_scheduled = false, 
ignore_memory_transaction_failures = false, icount_decr = {
  u32 = 4294964112, u16 = {low = 62352, high = 65535}}, hax_vcpu = 0x0, 
pending_tlb_flush = 0}
(gdb) p *cpu->halt_cond
$5 = {cond = {__data = {__lock = 0, __futex = 205, __total_seq = 103, 
__wakeup_seq = 102, __woken_seq = 102,
  __mutex = 0x5555565b6260 <qemu_global_mutex>, __nwaiters = 2, __broadcast_seq 
= 102},
  __size = 
"\000\000\000\000\315\000\000\000g\000\000\000\000\000\000\000f\000\000\000\000\000\000\000f\000\000\000\000\000\000\000`b[VUU\000\000\002\000\000\000f\000\000",
  __align = 880468295680}, initialized = true}

If I enable PPC_DEBUG_IRQ in hw/ppc/ppc.c I see some IRQs (such as decrementer and external when keys are pressed) are still raised and lowered while it appears to hang but somehow it's not progressing and the CPU sits waiting. So maybe it's running but waiting for some event that does not happen on QEMU like it does on real HW? I don't really know what to look for so if anyone has any more ideas then please share.

Thank you,
BALATON Zoltan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]