bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21245: 25.0.50; [PATCH] SIGSEGV when misusing (backtrace-frame) from


From: Pip Cet
Subject: bug#21245: 25.0.50; [PATCH] SIGSEGV when misusing (backtrace-frame) from custom debugger
Date: Wed, 12 Aug 2015 22:45:10 +0000

With the current GIT tree, I am running into reproduceable segfaults
that happen when a custom debugger routine is installed with (let
((debugger #'org-elisp-debugger)) ...code...) and the right (wrong)
code is run.  I believe I have traced down the segfault to a bug in
eval.c that I have been able to fix, but have discovered two more
potentially problematic scenarios in the process that I am not yet
including a fix for.

Unfortunately, this is another bug report for which the backtrace
information isn't very helpful, but I have included it for
completeness.

In GNU Emacs 25.0.50.23 (x86_64-unknown-linux-gnu, GTK+ Version 3.16.6)
 of 2015-08-12 on ...
Repository revision: e4de91d8dd2a06125140fb42772ec84a2f7ab290
Windowing system distributor `The X.Org Foundation', version 11.0.11702000
System Description:    Debian GNU/Linux unstable (sid)

Configured using:
 `configure 'CFLAGS=-O0 -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GCONF GSETTINGS NOTIFY
LIBSELINUX GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Quit
completing-read-default: Command attempted to use minibuffer while in minibuffer
Quit [2 times]

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message dired format-spec
rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util help-fns help-mode easymenu cl-loaddefs pcase cl-lib mail-prsvr
mail-utils time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment
elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan
thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind gfilenotify
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 81334 6405)
 (symbols 48 19075 0)
 (miscs 40 42 86)
 (strings 32 13247 4299)
 (string-bytes 1 376617)
 (vectors 16 11220)
 (vector-slots 8 413595 4312)
 (floats 8 129 124)
 (intervals 56 180 0)
 (buffers 976 11)
 (heap 1024 18494 1047))

-----

This is arguably a case of "don't do that, then", but I think it's a
bug. However, it's a bug that is triggered exclusively by installing a
custom debugger that is itself buggy.

I invested quite a bit of effort in finding this bug and attempting to
reproduce it without using too much local Emacs Lisp code that isn't
public at the moment, but have succeeded only in the first of those. I
do have a test case here, but it relies on a modified version of the
org-mode code and some extra code.

The problem, to proceed with analysis, is an invalid "arguments"
pointer in the backtrace structure on the specpdl stack.  Various
sub-cases of eval_sub allocate the arguments array in temporary
storage, then return to eval_sub after freeing the temp storage, even
though eval_sub calls the debugger next which might still need the
arguments residing in now-freed storage.

The culprit in my case was apply_lambda, which has this code:

  /* Do the debug-on-exit now, while arg_vector still exists.  */
  if (backtrace_debug_on_exit (specpdl + count))
    {
      /* Don't do it again when we return to eval.  */
      set_backtrace_debug_on_exit (specpdl + count, false);
      tem = call_debugger (list2 (Qexit, tem));
    }

Which I believe should read:


  /* Do the debug-on-exit now, while arg_vector still exists.  */
  if (backtrace_debug_on_exit (specpdl + count))
    {
      tem = call_debugger (list2 (Qexit, tem));
      /* Don't do it again when we return to eval.  */
      set_backtrace_debug_on_exit (specpdl + count, false);
    }

In my case, the debugger (again, this is a debugger bug) called
(backtrace-on-exit n t) and reset (to true) the debug-on-exit flag of
the specpdl entry that just had it cleared; the debugger would then
call (backtrace-frame n) and random stack data would be returned in
the resulting list, causing the segfault. In my case, this happened
only when garbage collection happened to be run after the debugger had
been started but before it had called (backtrace-frame n).

I am also very suspicious of the two cases of eval_sub that read:

      else if (XSUBR (fun)->max_args == MANY)
        {
          /* Pass a vector of evaluated arguments.  */
          Lisp_Object *vals;
          ptrdiff_t argnum = 0;
          USE_SAFE_ALLOCA;

          SAFE_ALLOCA_LISP (vals, XINT (numargs));

          GCPRO3 (args_left, fun, fun);
          gcpro3.var = vals;
          gcpro3.nvars = 0;

          while (!NILP (args_left))
            {
              vals[argnum++] = eval_sub (Fcar (args_left));
              args_left = Fcdr (args_left);
              gcpro3.nvars = argnum;
            }

          set_backtrace_args (specpdl + count, vals, XINT (numargs));

          val = (XSUBR (fun)->function.aMANY) (XINT (numargs), vals);
          UNGCPRO;
          SAFE_FREE ();
        }
      else

(which SAFE_FREEs the array that might yet be used by the debugger) and:

    {
      Lisp_Object numargs;
      Lisp_Object argvals[8];

      ...

      else
        {
          GCPRO3 (args_left, fun, fun);
          gcpro3.var = argvals;
          gcpro3.nvars = 0;

          maxargs = XSUBR (fun)->max_args;
          for (i = 0; i < maxargs; args_left = Fcdr (args_left))
            {
              argvals[i] = eval_sub (Fcar (args_left));
              gcpro3.nvars = ++i;
            }

          UNGCPRO;

          set_backtrace_args (specpdl + count, argvals, XINT (numargs));

          ...
        }
    }

which uses a stack array that goes out of scope before the debugger is
called.

While I strongly believe it would be best to focus our energies on
fixing this bug rather than reproducing it (which is going to be
unreliable as it relies on the contents of the C stack being modified
at the right time), here is the buggy debugger routine and the file
emacs-bug-002.el that triggered the bug in the gdb log I've attached:

---- debugger (not doing anything useful, crippled version that
reproduces the bug)

(defun org-elisp-debugger (&rest args)
  (message "args %S %S" args (backtrace-frame 1 #'org-elisp-debugger))
  (if (eq (car args) 'error)
      (apply debug args)
    (let ((count 0))
      (while (and (eq (car args) 'exit) (backtrace-frame count
#'org-elisp-debugger))
        (setq count (1+ count)))
      ;;(message "%S frames on stack, type %S" count (car args))
      ;(when (eq (car args) 'exit)
      (dotimes (delta0 10)
        (dotimes (delta1 10)
          (let ((a (nth (- count delta0) org-elisp-frames))
                (b (backtrace-frame (+ 0 delta1) #'org-elisp-debugger)))
            (when (and (not (equal (car a) (car b)))
                       (equal (cadr a) (cadr b))
                       (equal (length a) (length b)))
              (message "Backtrace:")
              (backtrace)
              (garbage-collect)
              (message "eval %S %S %S %S -> %S" delta0 delta1 (car args) a b)
              ;;(sleep-for 1.0)
              ))))
      (dotimes (i count)
        (if (and (> i 6) (< i (- count 93)))
            (backtrace-debug i t))
        (if (and (eq (car args) 'exit) (> i 0))
            (setf (nth (- count i) org-elisp-frames) (list i count
(length (backtrace-frame i #'org-elisp-debugger)) (backtrace-frame i
#'org-elisp-debugger))))))
    (prog1
        (if (eq (car args) 'exit)
            (cadr args)
          t)

----- emacs-bug-002.el

(add-to-list 'load-path "/home/pip/git/org-mode/lisp")
(require 'org)
(find-file "/home/pip/git/org-mode/lisp/org.el")
(eval-buffer)
(find-file "/home/pip/git/org-mode/lisp/org-colview.el")
(eval-buffer)
(find-file "/home/pip/emacs-bug-002.org")
(org-columns)

-----

Please contact me if it is absolutely necessary for you to reproduce
the bug, so I can work more to isolate the test case from my local
code or find a way of sharing the code with interested parties.
However, again, this bug is going to be hard to reproduce as it relies
on what intervening C code does with the stack, so it's possible a
test case would only work with my local compiler/library/org-mode
setup.

I've attached the patch for the case that I've actually seen, but
would like to repeat that I strongly suspect the other two cases to be
problematic as well.

Thanks!

Attachment: emacs-bug-005.diff
Description: Text document

Attachment: emacs-bug-info-005.txt
Description: Text document

Attachment: emacs-bug-002.el
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]