qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [Bug 823902] Re: multithreaded ARM seg/longjmp causes unini


From: Peter Maydell
Subject: [Qemu-devel] [Bug 823902] Re: multithreaded ARM seg/longjmp causes uninitialized stack frame due to0d10193870b5a81c3bce13a602a5403c3a55cf6c
Date: Wed, 10 Aug 2011 13:30:20 -0000

If you roll back to commit 2b41f10e186ccb4f0058815161586f8d6d006ea3 what
is the pass/fail rate?   That ought to separate out new bugs caused by
recent commits (including the 0d101 change which is definitely wrong
since it assumes cpu_single_env is only being used by one thread) from
random other multithreaded-user-mode problems like 668799.

volatile ought to work and be a conservative fix (although I'm not a fan
of volatile and compilers notoriously can't get it right). Making
cpu_single_env thread-local sounds like a reasonable idea for user-mode,
but I think that the current iothread code assumes that there is only
one running CPU and cpu_single_env is how you get at it from the
iothread. So if we go in that direction it would require more analysis
of code to figure out what it's doing with cpu_single_env.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/823902

Title:
  multithreaded ARM seg/longjmp causes uninitialized stack frame due
  to0d10193870b5a81c3bce13a602a5403c3a55cf6c

Status in QEMU:
  New

Bug description:
  Hi,
    I've got an ARM multithreaded test program that I wrote as a gcc testcase 
(attached) that fails on QEmu, firefox from Ubuntu ARM maverick also fails in 
the same way.  The failure is either a seg fault or '*** longjmp causes 
uninitialized stack frame ***: ./arm-linux-user/qemu-arm terminated' and it 
fails every time.

  The test works on real hardware - a dual core A9 panda board.  Firefox
  in an ARM maverick chroot also fails in the same way and is fixed in
  the same way.

  On 64bit Oneiric (i7-860 quad core) the backtrace from the seg looks like:
  #0  __sigsetjmp () at ../sysdeps/x86_64/setjmp.S:26
  #1  0x0000000060034cf4 in cpu_arm_exec (env=0x0) at 
/media/crypt/work/qemu/cpu-exec.c:233
  #2  0x0000000060006467 in cpu_loop (env=0x6226d060) at 
/media/crypt/work/qemu/linux-user/main.c:599
  #3  0x0000000060007984 in main (argc=<value optimised out>, argv=<value 
optimised out>, envp=<value optimised out>) at 
/media/crypt/work/qemu/linux-user/main.c:3567

  On 32bit lucid (core2 duo dual core) when it gives the longjmp error it's 
taken a bit of a more tortuous route but it looks like it originally took a seg 
at about the same place:
  #0  pthread_cond_wait ()
      at ../nptl/sysdeps/unix/sysv/linux/i386/i486/pthread_cond_wait.S:123
  #1  0x60000344 in exclusive_idle ()
      at /home/dg/linaro/git/qemu/linux-user/main.c:134
  #2  start_exclusive () at /home/dg/linaro/git/qemu/linux-user/main.c:144
  #3  stop_all_tasks () at /home/dg/linaro/git/qemu/linux-user/main.c:2996
  #4  0x60016491 in force_sig (target_sig=6)
      at /home/dg/linaro/git/qemu/linux-user/signal.c:378
  #5  0x60016f1d in queue_signal (env=0x639ff698, sig=6, info=0xb5610280)
      at /home/dg/linaro/git/qemu/linux-user/signal.c:451
  #6  0x60017375 in host_signal_handler (host_signum=6, info=0xb561031c, 
      puc=0xb561039c) at /home/dg/linaro/git/qemu/linux-user/signal.c:504
  #7  <signal handler called>
  #8  0x600c53d1 in raise ()
  #9  0x6009a133 in abort ()
  #10 0x600a0345 in __libc_message ()
  #11 0x600b977c in __fortify_fail ()
  #12 0x600b9717 in ____longjmp_chk ()
  #13 0x600b9697 in __longjmp_chk ()
  #14 0x6002b478 in cpu_loop_exit (env=0xb5611068)
      at /home/dg/linaro/git/qemu/cpu-exec.c:37
  #15 0x6001d4ff in exception_action (host_signum=11, pinfo=0xb5610c8c, 
      puc=0xb5610d0c) at /home/dg/linaro/git/qemu/user-exec.c:46
  ---Type <return> to continue, or q <return> to quit---
  #16 handle_cpu_signal (host_signum=11, pinfo=0xb5610c8c, puc=0xb5610d0c)
      at /home/dg/linaro/git/qemu/user-exec.c:123
  #17 cpu_arm_signal_handler (host_signum=11, pinfo=0xb5610c8c, puc=0xb5610d0c)
      at /home/dg/linaro/git/qemu/user-exec.c:186
  #18 0x600172f6 in host_signal_handler (host_signum=11, info=0xb5610c8c, 
      puc=0xb5610d0c) at /home/dg/linaro/git/qemu/linux-user/signal.c:492
  #19 <signal handler called>
  #20 0x60099ac6 in _setjmp ()
  #21 0x6002b4eb in cpu_arm_exec (env=0x0)
      at /home/dg/linaro/git/qemu/cpu-exec.c:233
  #22 0x600005bc in cpu_loop (env=0x639ff698)
      at /home/dg/linaro/git/qemu/linux-user/main.c:739
  #23 0x60006134 in clone_func (arg=0xbfdcf95c)
      at /home/dg/linaro/git/qemu/linux-user/syscall.c:3953
  #24 0x6008a8d0 in start_thread (arg=0xb5611b70) at pthread_create.c:300
  #25 0x600b7f1e in clone ()

  Things I've tried (with suggestions from Pete Maydell):

  If I remove the 'env = cpu_single_env;'  added by
  0d10193870b5a81c3bce13a602a5403c3a55cf6c (tcg: Reload local variables
  after return from longjmp) the test works reliably (10 out of 10
  passes) on 32bit Lucid and partially (7 out of 10 passes) on 64 bit
  Oneiric (some segs, some hangs).

  If I make cpu_single_env thread local with __thread and leave 0d101...
  in, then again it works reliably on 32bit Lucid, and is flaky on 64
  bit Oneiric (5/10 2 hangs, 3 segs)

  I've also tried using a volatile local variable in cpu_exec to hold a
  copy of env and restore that rather than cpu_single_env.  With this
  it's solid on 32bit lucid and flaky on 64bit Oneirc; these failures on
  64bit OO look like it running off the end of the code buffer (all 0
  code), jumping to non-existent code addresses and a seg in
  tb_reset_jump_recursive2.

  With both __thread and the volatile local I still get failures on
  64bit oneiric; they look mostly like they've run off the end of
  generated code (they're executing out of a buffer of all 0's).

  (I also tried some of the 64bit tests on an EC2 Xen Natty VM with
  similar results).

  My guess is I'm hitting multiple bugs here:
    1) The Lucid install is probably too old to hit the compiler bugs for which 
0d101... is a fix - but it is in itself triggering a new bug on the old 
compiler.
    2) The 64bit Natty and Oneiric installs are new enough to hit the compiler 
bug for which 0d101 is a fix
    3) I'm probably hitting something else as well, my guess is that it could 
be bug  668799 but I'm not clear why it doesn't happen on my 32bit lucid install

  Dave

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/823902/+subscriptions



reply via email to

[Prev in Thread] Current Thread [Next in Thread]