qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Hang with migration multi-thread compression under high


From: Li, Liang Z
Subject: Re: [Qemu-devel] Hang with migration multi-thread compression under high load
Date: Fri, 29 Apr 2016 04:51:58 +0000

> On Wed, Apr 27, 2016 at 03:20:23PM +0100, Daniel P. Berrange wrote:
> > I've been testing various features of migration and have hit a problem
> > with the multi-thread compression. It works fine when I have 2 or more
> > threads, but if I tell it to only use a single thread, then it almost
> > always hangs
> 
> [snip]
> 
> > Now the target QEMU shows this stack trace:
> 
> [snip]
> 
> > Thread 2 (Thread 0x7f8677bed700 (LWP 4657)):
> > #0  0x00007f86eb035b10 in pthread_cond_wait@@GLIBC_2.3.2 () at
> > /lib64/libpthread.so.0
> > #1  0x0000560ecd8a2709 in qemu_cond_wait
> > (address@hidden,
> address@hidden) at
> > util/qemu-thread-posix.c:123
> > #2  0x0000560ecd5b422d in do_data_decompress
> (opaque=0x560ed0391870)
> > at /home/berrange/src/virt/qemu/migration/ram.c:2195
> > #3  0x00007f86eb03060a in start_thread () at /lib64/libpthread.so.0
> > #4  0x00007f86ead6aa4d in clone () at /lib64/libc.so.6
> >
> > Thread 1 (Thread 0x7f86f767dc80 (LWP 4651)):
> > #0  0x0000560ecd5b744f in ram_load (len=711, host=0x7f8677e06000,
> > f=0x560ed03a7950) at
> /home/berrange/src/virt/qemu/migration/ram.c:2263
> > #1  0x0000560ecd5b744f in ram_load (f=0x560ed03a7950,
> opaque=<optimized out>, version_id=<optimized out>)
> >     at /home/berrange/src/virt/qemu/migration/ram.c:2513
> > #2  0x0000560ecd5b8b87 in vmstate_load (f=0x560ed03a7950,
> se=0x560ece731f90, version_id=4)
> >     at /home/berrange/src/virt/qemu/migration/savevm.c:643
> > #3  0x0000560ecd5b8fc3 in qemu_loadvm_state_main
> (mis=0x560ece75c330,
> > f=0x560ed03a7950) at
> > /home/berrange/src/virt/qemu/migration/savevm.c:1819
> > #4  0x0000560ecd5b8fc3 in qemu_loadvm_state_main
> (address@hidden, address@hidden)
> >     at /home/berrange/src/virt/qemu/migration/savevm.c:1850
> > #5  0x0000560ecd5bbd36 in qemu_loadvm_state
> (address@hidden)
> > at /home/berrange/src/virt/qemu/migration/savevm.c:1911
> > #6  0x0000560ecd7b1b2f in process_incoming_migration_co
> > (opaque=0x560ed03a7950) at migration/migration.c:384
> > #7  0x0000560ecd8b1eba in coroutine_trampoline (i0=<optimized out>,
> > i1=<optimized out>) at util/coroutine-ucontext.c:78
> > #8  0x00007f86eacb0f30 in __start_context () at /lib64/libc.so.6
> > #9  0x00007ffc877e49c0 in  ()
> >
> >
> > for some reason it isn't shown in the stack thrace for thread
> > 1 above, when initially connecting GDB it says the main thread is at:
> >
> > decompress_data_with_multi_threads (len=702, host=0x7fd78fe06000,
> f=0x55901af09950) at /home/berrange/src/virt/qemu/migration/ram.c:2254
> > 2254                for (idx = 0; idx < thread_count; idx++) {
> >
> >
> > Looking at the target QEMU, we see  do_data_decompress method is
> > waiting in a condition var:
> >
> >         while (!param->start && !quit_decomp_thread) {
> >         qemu_cond_wait(&param->cond, &param->mutex);
> >             ....do stuff..
> >         param->start = false
> >         }
> >
> >
> > Now the decompress_data_with_multi_threads is checking param->start
> > without holding the param->mutex lock.
> 
> Ok, so I've investigated this further and
> decompress_data_with_multi_threads
> is where I believe the flaw is. Its code is:
> 
> static void decompress_data_with_multi_threads(QEMUFile *f,
>                                                void *host, int len) {
>     int idx, thread_count;
> 
>     thread_count = migrate_decompress_threads();
>     while (true) {
>         for (idx = 0; idx < thread_count; idx++) {
>             if (!decomp_param[idx].start) {
>                 qemu_get_buffer(f, decomp_param[idx].compbuf, len);
>                 decomp_param[idx].des = host;
>                 decomp_param[idx].len = len;
>                 start_decompression(&decomp_param[idx]);
>                 break;
>             }
>         }
>         if (idx < thread_count) {
>             break;
>         }
>     }
> }
> 
> 
> Looking at the dissembly for the start of that method we have:
> 
>         for (idx = 0; idx < thread_count; idx++) {
>     358a:       45 85 f6                test   %r14d,%r14d
>     358d:       7e fb                   jle    358a <ram_load+0x49a>
>             if (!decomp_param[idx].start) {
>     358f:       45 31 e4                xor    %r12d,%r12d
>     3592:       40 84 ff                test   %dil,%dil
>     3595:       48 8d 42 78             lea    0x78(%rdx),%rax
>     3599:       0f 84 b1 04 00 00       je     3a50 <ram_load+0x960>
>     359f:       90                      nop
> 
> 
> Now asm is not my strong suit, but IIUC, that is showing that the access to
> 'decomp_param[idx].start' is *not* accessing the actual struct memory
> offset, but instead accessing a cached copy in the %dil register. This is 
> valid
> because we've not told the compiler that this struct variable can be modified
> by multiple threads at once.
> 
> So when the corresponding do_data_decompress() method sets
> 'decomp_param[idx].start = false', this is never seen by the
> decompress_data_with_multi_threads() method.
> 
> If decompress_data_with_multi_threads() used a mutex lock around its
> access to 'decomp_param[idx].start', then there would be a memory barrier
> and the code would not be checking the value cached in the %dil register.
> 
> Now we don't want a mutex lock in this code, but we do still need to have a
> memory barrier here, so I think we need this
> patch:
> 
> diff --git a/migration/ram.c b/migration/ram.c index be0233f..fddc136 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2260,6 +2260,7 @@ static void
> decompress_data_with_multi_threads(QEMUFile *f,
>      thread_count = migrate_decompress_threads();
>      while (true) {
>          for (idx = 0; idx < thread_count; idx++) {
> +            smp_mb();
>              if (!decomp_param[idx].start) {
>                  qemu_get_buffer(f, decomp_param[idx].compbuf, len);
>                  decomp_param[idx].des = host;
> 
> 
> Running my test case with that applied certainly makes migration run as
> normal without the hangs I see currently
> 
> 

Yes, you patch can fix the issue, but the busy loop is not good. We have to fix 
this.

This is a patch which follows the pattern of the compression code. I think it 
can fix this issue too.

Could you help to test if it works in your environment? 

Liang


> Regards,
> Daniel
> --
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

Attachment: fix_multiple_thread_decompression_issue.patch
Description: fix_multiple_thread_decompression_issue.patch


reply via email to

[Prev in Thread] Current Thread [Next in Thread]