qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] linux-user crashing in multi-threaded programs


From: Alexander Graf
Subject: Re: [Qemu-devel] linux-user crashing in multi-threaded programs
Date: Mon, 21 Nov 2011 03:33:44 +0100

On 21.11.2011, at 03:05, Alexander Graf wrote:

> Howdy,
> 
> In an adventure to find out why yast2-core fails in its testsuite during 
> build on ARM for us, I dove into the code for a bit and debugged it down to 
> our old friend: multi-threaded TB invalidation.
> 
> I have a nice test case at
> 
>  http://csgraf.de/tmp/yast2-core.tbz2
> 
> which you can easily use to reproduce the race. I run it using the following 
> command line:
> 
>  qemu-arm -L yast2-core yast2-core/bin/test_thread_log.prg
> 
> It breaks pretty much every single time.
> 
> Apparently two threads try to modify jmp_next at the same time, rendering the 
> whole invalidation racy. The following patch triggers while in normal 
> single-threaded worlds it should never appear:
> 
> diff --git a/exec.c b/exec.c
> index 6b92198..0c48222 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1356,6 +1356,7 @@ static inline void 
> tb_reset_jump_recursive2(TranslationBlock *tb, int n)
>             tb1 = (TranslationBlock *)((long)tb1 & ~3);
>             if (n1 == 2)
>                 break;
> +            assert(tb1);
>             tb1 = tb1->jmp_next[n1];
>         }
>         /* we are now sure now that tb jumps to tb1 */
> 
> 
> So how do I know it's a race in the chaining code? Well, it only happens with 
> threaded code. And it only happens when actually chaining. The following 
> patch makes the test case work:
> 
> diff --git a/exec-all.h b/exec-all.h
> index c211242..c8124e9 100644
> --- a/exec-all.h
> +++ b/exec-all.h
> @@ -259,6 +259,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
> static inline void tb_add_jump(TranslationBlock *tb, int n,
>                                TranslationBlock *tb_next)
> {
> +    if(1) return;
>     /* NOTE: this test is only needed for thread safety */
>     if (!tb->jmp_next[n]) {
>         /* patch the native jump address */
> 
> 
> I would advice against using it as a real fix though, as it effectively 
> disables TB chaining, rendering all of QEMU pretty slow.
> 
> 
> Does anyone have pending patches, approaches or ideas on how to fix this? It 
> seems to be the last big road block towards getting linux-user actually 
> workable for us.

As a small note, the following patch at least makes the test case (and probably 
a lot of other applications) work for me. It fixes no underlying problems nor 
does it even remotely try to, but it might help someone in case he stumbles 
over the same thing.

Basically it assumes that multi-threaded applications break with TB chaining, 
so it disables it as soon as it sees the first thread creation. Thus 
multi-threaded applications will be slow, while non-multi-threaded applications 
maintain their speed.


Alex

diff --git a/exec-all.h b/exec-all.h
index c211242..5374804 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -256,9 +256,14 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 
 #endif
 
+extern bool threaded;
 static inline void tb_add_jump(TranslationBlock *tb, int n,
                                TranslationBlock *tb_next)
 {
+    if (threaded) {
+        /* multi-threaded applications break with TB chaining */
+        return;
+    }
     /* NOTE: this test is only needed for thread safety */
     if (!tb->jmp_next[n]) {
         /* patch the native jump address */
diff --git a/exec.c b/exec.c
index 6b92198..a214097 100644
--- a/exec.c
+++ b/exec.c
@@ -82,6 +82,7 @@ TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
 static int nb_tbs;
 /* any access to the tbs or the page table must use this lock */
 spinlock_t tb_lock = SPIN_LOCK_UNLOCKED;
+bool threaded;
 
 #if defined(__arm__) || defined(__sparc_v9__)
 /* The prologue must be reachable with a direct jump. ARM and Sparc64
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 312aec5..58ec7a8 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4010,6 +4010,7 @@ static int do_fork(CPUState *env, unsigned int flags, 
abi_ulong newsp,
         ts->bprm = parent_ts->bprm;
         ts->info = parent_ts->info;
 #if defined(CONFIG_USE_NPTL)
+        threaded = true;
         nptl_flags = flags;
         flags &= ~CLONE_NPTL_FLAGS2;
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]