qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/2] migration: fix deadlock


From: Igor Redko
Subject: Re: [Qemu-devel] [PATCH v2 0/2] migration: fix deadlock
Date: Wed, 30 Sep 2015 17:28:05 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

On 29.09.2015 11:47, Dr. David Alan Gilbert wrote:
* Igor Redko (address@hidden) wrote:
On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote:
On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
Release qemu global mutex before call synchronize_rcu().
synchronize_rcu() waiting for all readers to finish their critical
sections. There is at least one critical section in which we try
to get QGM (critical section is in address_space_rw() and
prepare_mmio_access() is trying to aquire QGM).

Both functions (migration_end() and migration_bitmap_extend())
are called from main thread which is holding QGM.

Thus there is a race condition that ends up with deadlock:
main thread     working thread
Lock QGA                |
|             Call KVM_EXIT_IO handler
|                       |
|        Open rcu reader's critical section
Migration cleanup bh    |
|                       |
synchronize_rcu() is    |
waiting for readers     |
|            prepare_mmio_access() is waiting for QGM
   \                   /
          deadlock

Patches here are quick and dirty, compile-tested only to validate the
architectual approach.

Igor, Anna, can you pls start your tests with these patches instead of your
original one. Thank you.

Can you give me the backtrace of the working thread?

I think it is very bad to wait some lock in rcu reader's cirtical section.

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 
<qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 
<qemu_global_mutex>) at util/qemu-thread-posix.c:73
#3  0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at 
/home/user/my_qemu/qemu/cpus.c:1170
#4  0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at 
/home/user/my_qemu/qemu/exec.c:2390
#5  0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 <address_space_io>, 
addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, is_write=true)
     at /home/user/my_qemu/qemu/exec.c:2425
#6  0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., 
data=0x7f1ef3f97000, direction=1, size=1, count=1) at 
/home/user/my_qemu/qemu/kvm-all.c:1680
#7  0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at 
/home/user/my_qemu/qemu/kvm-all.c:1849
#8  0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at 
/home/user/my_qemu/qemu/cpus.c:979
#9  0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at 
pthread_create.c:333
#10 0x00007f1ef0e6feed in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Do you have a test to run in the guest that easily triggers this?

Dave

There are two ways to trigger this. Both of them need 2 hosts with qemu+libvirt (host0 and host1) configured for migration.

First way:
0. Create VM on host0 and install centos7
1. Shutdown VM.
2. Start VM (virsh start <VM_name>) and right after that start migration to host1 (smth like 'virsh migrate --live --verbose <VM_name> "qemu+ssh://host1/system"') 3. Stop migration after ~1 sec (after migration process have been started, but before it completed. for example when you see "Migration: [ 5 %]")

deadlock: no response from VM and no response from qemu monitor (for example 'virsh qemu-monitor-command --hmp <VM_NAME> "info migrate"' will hang indefinitely) 9/10

Second way:
0. Create VM with e1000 network card on host0 and install centos7
1. Run iperf on VM (or any other load on network)
2. Start migration
3. Stop migration before it completed.

For this approach e1000 network card is essential because it generates KVM_EXIT_MMIO.

Igor


Signed-off-by: Denis V. Lunev <address@hidden>
CC: Igor Redko <address@hidden>
CC: Anna Melekhova <address@hidden>
CC: Juan Quintela <address@hidden>
CC: Amit Shah <address@hidden>

Denis V. Lunev (2):
   migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
   migration: fix deadlock

  migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
  1 file changed, 28 insertions(+), 17 deletions(-)





--
Dr. David Alan Gilbert / address@hidden / Manchester, UK





reply via email to

[Prev in Thread] Current Thread [Next in Thread]