emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

The emacs_backtrace "feature"


From: Eli Zaretskii
Subject: The emacs_backtrace "feature"
Date: Fri, 21 Sep 2012 12:49:17 +0300

Based on my experience, I expect this "feature" to be hated, by users
and Emacs maintainers alike.

My experience is based on years of working with the DJGPP development
environment.  DJGPP (www.delorie.com/djgpp/) is a Posix-compliant
development environment, based on ported GNU tools and an
independently written standard C library, for developing 32-bit
protected-mode programs that run on MS-DOS and compatible systems.  In
particular, the MS-DOS build of Emacs uses DJGPP.

In DJGPP, displaying the backtrace on fatal errors is the default,
because core files are not supported.  So, when a DJGPP-compiled
program crashes, it displays a register dump and a backtrace.  Here's
a typical example (I deliberately truncated the backtrace at the end,
which was much longer in reality):

  Exiting due to signal SIGABRT
  Raised at eip=0012f2a6
  eax=002ee7fc ebx=00000120 ecx=00000000 edx=00000000 esi=003a533d edi=002f4cc0
  ebp=002ee8a8 esp=002ee7f8 program=H:\test\emacs-djgpp\emacs\src\temacs.exe
  cs: sel=0257  base=02c30000  limit=0104ffff
  ds: sel=025f  base=02c30000  limit=0104ffff
  es: sel=025f  base=02c30000  limit=0104ffff
  fs: sel=022f  base=0001d580  limit=0000ffff
  gs: sel=027f  base=00000000  limit=0010ffff
  ss: sel=025f  base=02c30000  limit=0104ffff
  App stack: [002eed94..002d5d94]  Exceptn stack: [002d5c68..002d3d28]

  Call frame traceback EIPs:
    0x0012f1c4
    0x0012f2a6
    0x00118377
    0x0011191d
    0x00068cff
    0x00068c41

A companion utility program captures the addresses and the executable
file name from the screen, and adds the corresponding function name
plus offset to each line (if the executable was not stripped), and
also the source file/line information, if that info is found.
Example:

  Call frame traceback EIPs:
    0x0001039f execute_builtin+191, file c:/djgpp/gnu/bash-2.03/execute_cmd.c, 
line 2878
    0x00010840 execute_builtin_or_function+176, file 
c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 3173
    0x0001011b execute_simple_command+659, file 
c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 2745
    0x0000de00 execute_command_internal+1876, file 
c:/djgpp/gnu/bash-2.03/execute_cmd.c, line 824
    0x0000d459 execute_command+69, file c:/djgpp/gnu/bash-2.03/execute_cmd.c, 
line 314

As nice as this looks, it has several disadvantages:

 . Many real-life backtraces are long and quickly scroll off the
   screen.  If you didn't make a point of setting up very large screen
   buffers of your shell windows, or redirect standard error to a
   file, you'll lose precious information.  Since these precautions
   are only taken when one expects a crash, guess how many times these
   measures are in place when they are needed.

 . Many calls to emacs_backtrace in the current sources limit the
   number of backtrace frames to 10, but that is an arbitrary
   limitation which will be too small in most, if not all, situations.
   Check out the crash backtraces posted to the bug tracker.  As an
   extreme (but quite frequent) data point, crashes in GC tend to have
   many hundreds, and sometimes many thousands, of frames in them.  In
   reality, there's no way of knowing how many frames will be there,
   and how many of them will be needed to get enough useful
   information for finding the problem.  I predict that more often
   than not we will be looking at useless backtraces, while users who
   reported those backtraces will rightfully expect us to find the bug
   and fix it.

 . The backtrace is written to the standard error file handle.  Is
   that handle always guaranteed to be available and connected to a
   screen or a disk file that the user can find afterwards?  E.g., if
   Emacs is invoked from an environment which redirects that handle to
   the null device, the information will be lost.  (On MS-Windows, GUI
   applications launched by clicking a desktop icon have this handle
   closed, so anything written to it disappears without a trace; I
   don't know if Posix desktops have something similar.)

 . Last, but not least, even if the drawbacks described above are not
   an issue in some particular crash report, using the limited
   information it provides can be quite difficult, especially if the
   crash happened in a binary compiled by a different compiler version
   than yours, let alone on an architecture different from the one
   used by the person who tries to get some sense out of it.  Here's
   an example of what emacs_backtrace will produce (slightly edited
   from what you see on
   http://linux.die.net/man/3/backtrace_symbols_fd):

    Backtrace:
    ./emacs(myfunc4+0x5c) [0x80487f0]
    ./emacs [0x8048871]
    ./emacs(myfunc3+0x21) [0x8048894]
    ./emacs(myfunc2+0x1a) [0x804888d]
    ./emacs(myfunc1+0x1a) [0x804888d]
    ./emacs(main+0x65) [0x80488fb]
    /lib/libc.so.6(__libc_start_main+0xdc) [0xb7e38f9c]
    ./emacs [0x8048711]

   It doesn't even show the source line info, like DJGPP did.
   Translating myfunc1+0x1a etc. into source-level info is not an easy
   task, unless you are lucky and there's only one place where it
   calls myfunc2.  If not, you are left with guesswork.  Making sense
   of the backtrace without being able to get at the corresponding
   source lines is not for the faint at heart.  More often than not,
   the Emacs maintainers will be tempted to ignore such a report, and
   ask for a GDB backtrace instead.

So given all of the above, I'm asking why do we want this feature?
Why not use the good old core dump files?  They have all the
information that is needed for debugging the crash, while the above
falls short of that mark by a large measure.  It seems like a step
backward.  I always thought that the lack of core files in DJGPP was a
serious limitations, so I'm amazed to see modern environments actually
_wanting_ that limited debug feature in favor of core dumps and real
debuggability.  Until now, the only uses I saw for the 'backtrace'
function were when a debugger couldn't be used at all, or the core
file couldn't be produced due to system-level requirements, such as
limited disk space or some stringent time constraints.  But here we do
that voluntarily and by default.  Why?

Having said all that, I'm not really interested in disputing these
points.  I wanted to communicate my own, mostly negative, experience
of many years using a similar feature.  If more information is
required, in particular about DJGPP and how it created and used the
backtraces, I will gladly provide answers to any questions.
Otherwise, I guess we will find soon enough whether this is a great
feature or not.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]