Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock

From:	Igor Redko
Subject:	Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock
Date:	Tue, 29 Sep 2015 18:32:57 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

On 25.09.2015 11:23, Wen Congyang wrote:

On 09/25/2015 04:03 PM, Denis V. Lunev wrote:

On 09/25/2015 04:21 AM, Wen Congyang wrote:

On 09/24/2015 08:53 PM, Denis V. Lunev wrote:

From: Igor Redko <address@hidden>

Release qemu global mutex before call synchronize_rcu().
synchronize_rcu() waiting for all readers to finish their critical
sections. There is at least one critical section in which we try
to get QGM (critical section is in address_space_rw() and
prepare_mmio_access() is trying to aquire QGM).

Both functions (migration_end() and migration_bitmap_extend())
are called from main thread which is holding QGM.

Thus there is a race condition that ends up with deadlock:
main thread        working thread
Lock QGA                |
|             Call KVM_EXIT_IO handler
|                        |
|        Open rcu reader's critical section
Migration cleanup bh    |
|                       |
synchronize_rcu() is    |
waiting for readers     |
|            prepare_mmio_access() is waiting for QGM
    \                   /
           deadlock

The patch just releases QGM before calling synchronize_rcu().

Signed-off-by: Igor Redko <address@hidden>
Reviewed-by: Anna Melekhova <address@hidden>
Signed-off-by: Denis V. Lunev <address@hidden>
CC: Juan Quintela <address@hidden>
CC: Amit Shah <address@hidden>
---
   migration/ram.c | 6 ++++++
   1 file changed, 6 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7f007e6..d01febc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1028,12 +1028,16 @@ static void migration_end(void)
   {
       /* caller have hold iothread lock or is in a bh, so there is
        * no writing race against this migration_bitmap
+     * but rcu used not only for migration_bitmap, so we should
+     * release QGM or we get in deadlock.
        */
       unsigned long *bitmap = migration_bitmap;
       atomic_rcu_set(&migration_bitmap, NULL);
       if (bitmap) {
           memory_global_dirty_log_stop();
+        qemu_mutex_unlock_iothread();
           synchronize_rcu();
+        qemu_mutex_lock_iothread();

migration_end() can called in two cases:
1. migration_completed
2. migration is cancelled

In case 1, you should not unlock iothread, otherwise, the vm's state may be 
changed
unexpectedly.


sorry, but there is now very good choice here. We should either
unlock or not call synchronize_rcu which is also an option.

In the other case the rework should be much more sufficient.


I don't reproduce this bug. But according to your description, the bug only 
exists
in case 2. Is it right?

When migration is successfully completed, VM has been already stoppedbefore migration_end() is being called. VM must be running to reproducethis bug. So, yes bug exists only in case 2

FYI

To reproduce this bug you need 2 hosts with qemu+libvirt (host0 andhost1) configured for migration.

0. Create VM on host0 and install centos7
1. Shutdown VM.

2. Start VM (virsh start <VM_name>) and right after that start migrationto host1 (smth like 'virsh migrate --live --verbose <VM_name>"qemu+ssh://host1/system"')3. Stop migration after ~1 sec (after migration process have beenstarted, but before it completed. for example when you see "Migration: [5 %]")

Works for me 9/10

deadlock: no response from VM and no response from qemu monitor (forexample 'virsh qemu-monitor-command --hmp <VM_NAME> "info migrate"' willhang indefinitely)


Another way:
0. Create VM with e1000 network card on host0 and install centos7
1. Run iperf on VM (or any other load on network)
2. Start migration
3. Stop migration before it completed.

For this approach e1000 network card is essential because it generatesKVM_EXIT_MMIO.

Den

           g_free(bitmap);
       }
   @@ -1085,7 +1089,9 @@ void migration_bitmap_extend(ram_addr_t old, 
ram_addr_t new)
           atomic_rcu_set(&migration_bitmap, bitmap);
           qemu_mutex_unlock(&migration_bitmap_mutex);
           migration_dirty_pages += new - old;
+        qemu_mutex_unlock_iothread();
           synchronize_rcu();
+        qemu_mutex_lock_iothread();

Hmm, I think it is OK to unlock iothread here

           g_free(old_bitmap);
       }
   }

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock, Igor Redko <=
- Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock, Juan Quintela, 2015/10/08

Prev by Date: Re: [Qemu-devel] [PATCH v3 03/16] util: add memfd helpers
Next by Date: Re: [Qemu-devel] [PATCH v4 1/7] tests: Fix how qom-test is run
Previous by thread: Re: [Qemu-devel] [PATCH 15/16] block: Add and use bdrv_replace_in_backing_chain()
Next by thread: Re: [Qemu-devel] [PATCH 1/1] migration: fix deadlock
Index(es):
- Date
- Thread