bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: core dump triggered by garbage collection (?)


From: Mark McAuliffe
Subject: Re: core dump triggered by garbage collection (?)
Date: Fri, 5 Sep 2003 00:32:48 -0700

Richard Stallman writes:
>     #19 0x0810f00e in lisp_free (block=0x8be41b0) at alloc.c:630
>     #20 0x081130dc in gc_sweep () at alloc.c:5270
> 
> To learn something this crash, it is necessary to analyze the data
> being operated on in those two frames, and try to figure out what was
> inconsistent in the data (and what the data were being used for).
> Knowing that, we might be able to figure out the code that created
> the invalid data.
> 
> This is not easy, but I don't know of any substitute for it.


I've spent some time looking into this.  I don't know that I have found
anything of value, but here is what I've got so far...

For starters, I have had 2 more crashes since I reported the bug
originally, so I now have 4 core files worth of info.  There appear to be 2
types of crash -- presumably the same underlying problem, but 2 different
manifestations.  In one type, it appears that corrupt data are being found
in compact_small_strings.  3 of the 4 core files are like this.  The other
type finds the corrupt data in gc_sweep.  This latter type is the one you
specifically asked about above, but it is also the one I have had less luck
analyzing (no luck at all, in fact).  I'm hoping that what I have been able
to learn about the former type will be helpful.  If it's not, perhaps you
could help steer me in the right direction for the latter one.

For "type 1" core files, I wrote a gdb user-defined procedure that can
traverse the linked list in compact_small_strings (the inner one, that
starts with "for (from = &b->first_data; from < end; from = from_end)".
FWIW, it looks like this:

define a
  if ( $from < end )
    if ( $from->string == 0 )
      set $n = $from->u.nbytes
    else
      set $s = $from->string
      p $s
      p *$s
      p ((char*)($s->data)) - ((char*)&($from->u.data))
      if ( $s->size_byte < 0 )
        set $n = $s->size
      else
        set $n = $s->size_byte
      end
    end
    p $nb = ( $n + 8 ) & ~3
    p $from = (struct sdata *)((char*)$from + $nb)
    p *$from
  end
end


I initialized $from to be &b->first_data, as in the for-loop, and ran
procedure "a" to traverse the list of struct sdata's until it ran into
corruption.  I did this for the 3 core files that have the problem in
compact_small_strings, and I found that the data that appeared right before
the corruption were similar.  Below is the last few iterations from each
core file:

core.17451:

$104 = (struct Lisp_String *) 0x8b6d9c4
$105 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc80 "  5"}
$106 = 2875744
$107 = 8
$108 = (struct sdata *) 0x8dcdb24
$109 = {string = 0x8b6d994, u = {data = "6", nbytes = 1667432502}}
(gdb) 
$110 = (struct Lisp_String *) 0x8b6d994
$111 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc88 "6"}
$112 = 2875744
$113 = 8
$114 = (struct sdata *) 0x8dcdb2c
$115 = {string = 0x8b6d934, u = {data = " ", nbytes = 3547168}}
(gdb) 
$116 = (struct Lisp_String *) 0x8b6d934
$117 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bc90 "  6"}
$118 = 2875744
$119 = 8
$120 = (struct sdata *) 0x8dcdb34
$121 = {string = 0x8b6d924, u = {data = "7", nbytes = 538968119}}
(gdb) 
$122 = (struct Lisp_String *) 0x8b6d924
$123 = {size = 1, size_byte = -1, intervals = 0x0, data = 0x908bc98 "7"}
$124 = 2875744
$125 = 8
$126 = (struct sdata *) 0x8dcdb3c
$127 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}}
(gdb) 
$128 = (struct Lisp_String *) 0x8b6d914
$129 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x908bca0 ""}
$130 = 2875744
$131 = 8
$132 = (struct sdata *) 0x8dcdb44
$133 = {string = 0x20202020, u = {data = "m", nbytes = 1919115629}}
(gdb) 
$134 = (struct Lisp_String *) 0x20202020
Cannot access memory at address 0x20202020
(gdb) 


core.24594

$269 = (struct Lisp_String *) 0x9c1c50c
$270 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed40 "18"}
$271 = -4418724
$272 = 8
$273 = (struct sdata *) 0xa0559e8
$274 = {string = 0x9c1c4ec, u = {data = " ", nbytes = 3682592}}
(gdb) 
$275 = (struct Lisp_String *) 0x9c1c4ec
$276 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed48 " 18"}
$277 = -4418724
$278 = 8
$279 = (struct sdata *) 0xa0559f0
$280 = {string = 0x9c1c4dc, u = {data = "1", nbytes = 14641}}
(gdb) 
$281 = (struct Lisp_String *) 0x9c1c4dc
$282 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed50 "19"}
$283 = -4418724
$284 = 8
$285 = (struct sdata *) 0xa0559f8
$286 = {string = 0x9c1c4bc, u = {data = " ", nbytes = 3748128}}
(gdb) 
$287 = (struct Lisp_String *) 0x9c1c4bc
$288 = {size = 3, size_byte = -1, intervals = 0x0, data = 0x9c1ed58 " 19"}
$289 = -4418724
$290 = 8
$291 = (struct sdata *) 0xa055a00
$292 = {string = 0x9c1a494, u = {data = "", nbytes = 0}}
(gdb) 
$293 = (struct Lisp_String *) 0x9c1a494
$294 = {size = 2, size_byte = -1, intervals = 0x0, data = 0x9c1ed60 ""}
$295 = -4418724
$296 = 8
$297 = (struct sdata *) 0xa055a08
$298 = {string = 0x43c143, u = {data = "8", nbytes = 1240629304}}
(gdb) 
$299 = (struct Lisp_String *) 0x43c143
Cannot access memory at address 0x43c143


core.25897

$1007 = (struct Lisp_String *) 0x9fb29c4
$1008 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc7c "7"}
$1009 = 74632
$1010 = 8
$1011 = (struct sdata *) 0xa35a8f8
$1012 = {string = 0x9fb2964, u = {data = " ", nbytes = 3612704}}
(gdb) 
$1013 = (struct Lisp_String *) 0x9fb2964
$1014 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc84 "  7"}
$1015 = 74632
$1016 = 8
$1017 = (struct sdata *) 0xa35a900
$1018 = {string = 0x9fb2944, u = {data = "8", nbytes = 56}}
(gdb) 
$1019 = (struct Lisp_String *) 0x9fb2944
$1020 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc8c "8"}
$1021 = 74632
$1022 = 8
$1023 = (struct sdata *) 0xa35a908
$1024 = {string = 0x9fb2924, u = {data = " ", nbytes = 3678240}}
(gdb) 
$1025 = (struct Lisp_String *) 0x9fb2924
$1026 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cc94 "  8"}
$1027 = 74632
$1028 = 8
$1029 = (struct sdata *) 0xa35a910
$1030 = {string = 0xa388b24, u = {data = "9", nbytes = 57}}
(gdb) 
$1031 = (struct Lisp_String *) 0xa388b24
$1032 = {size = 1, size_byte = -1, intervals = 0x0, data = 0xa36cc9c "9"}
$1033 = 74632
$1034 = 8
$1035 = (struct sdata *) 0xa35a918
$1036 = {string = 0xa388ae4, u = {data = "", nbytes = 0}}
(gdb) 
$1037 = (struct Lisp_String *) 0xa388ae4
$1038 = {size = 3, size_byte = -1, intervals = 0x0, data = 0xa36cca4 ""}
$1039 = 74632
$1040 = 8
$1041 = (struct sdata *) 0xa35a920
$1042 = {string = 0x24, u = {data = "$", nbytes = 36}}
(gdb) 
$1043 = (struct Lisp_String *) 0x24
Cannot access memory at address 0x24
(gdb) 


In all three cases, the strings that appear before the corruption are
numbers.  Since the crash always seems to happen when I try to read mail
with VM, I assume those numbers are the message numbers in the VM summary
buffer.  Significant?  Helpful??  I dunno...


I also tried to figure out what the data was that overwrote the list data
for tthe 3 core files:


core.17451

The gdb snippet below picks up right after the above snippet for
core.17451.  The overwriting data appears to be basically text (a compiled
lisp macro?):

(gdb) p $x = $126
$135 = (struct sdata *) 0x8dcdb3c
(gdb) p *$x
$136 = {string = 0x8b6d914, u = {data = "", nbytes = 538976256}}
(gdb) set print null-stop o
Display all 117 possibilities? (y or n)
(gdb) set print null-stop off
(gdb) p $x->u.data
$137 = ""
(gdb) p $x->u.data[0]@20
$138 = "\0       macro %\b%_\b_"
(gdb) p $x->u.data[0]@100
$139 = "\0       macro 
%\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt
 and will be  created  in\n"
(gdb) p $x->u.data[0]@200
$140 = "\0       macro 
%\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt
 and will be  created  in\n", ' ' <repeats 14 times>, "the  directory  named  
by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n       
-\b--\b-p\bpr\bre\bef\bfi\bix\b"
(gdb) p $x->u.data[0]@400
$141 = "\0       macro 
%\b%_\b_r\bre\bep\bpa\bac\bck\bka\bag\bge\be_\b_n\bna\bam\bme\be_\b_f\bfm\bmt\bt
 and will be  created  in\n", ' ' <repeats 14 times>, "the  directory  named  
by the macro %\b%_\b_\0 fr\0\0\0\0\0\004\0\0\n\n       
-\b--\b-p\bpr\bre\bef\bfi\bix\b\0 _\b"...

(I hope that stuff survives being emailed...).


core.24594

This gdb snippet more-or-less picks up where the above 24594 snippet left
off, with some editing:

(gdb) p $x = $267
$306 = (struct sdata *) 0xa0559e0
(gdb) x/100 $x->u.data
0xa0559e4:      0x49003831      0x09c1c4ec      0x00383120      0x09c1c4dc
0xa0559f4:      0x00003931      0x09c1c4bc      0x00393120      0x09c1a494
0xa055a04:      0x00000000      0x0043c143      0x49f28038      0x00000006
0xa055a14:      0x40000000      0x00000032      0x0043c144      0x49f28038
0xa055a24:      0x00000006      0x40000000      0x00000032      0x0043c145
0xa055a34:      0x49f28038      0x00000006      0x40000000      0x0000002e
0xa055a44:      0x0043c146      0x49f28038      0x00000006      0x40000000
0xa055a54:      0x0000002e      0x00000000      0x00000000      0x00000006
0xa055a64:      0x40000000      0x00000020      0x00005480      0x489f3ce0
0xa055a74:      0x00000006      0x40000004      0x0000002f      0x00005481
0xa055a84:      0x489f3ce0      0x00000006      0x40000004      0x00000077
0xa055a94:      0x00005482      0x489f3ce0      0x00000006      0x40000004
0xa055aa4:      0x0000006f      0x00005483      0x489f3ce0      0x00000006
0xa055ab4:      0x40000004      0x00000072      0x00005484      0x489f3ce0
0xa055ac4:      0x00000006      0x40000004      0x00000000      0x09c1a494
0xa055ad4:      0x48003032      0x09c1a454      0x00303220      0x09c1a424
0xa055ae4:      0x00003132      0x09c1a414      0x00313220      0x09c1a404
0xa055af4:      0x00003232      0x09c1a3f4      0x00323220      0x09c1a3e4
0xa055b04:      0x40003332      0x09c1a3d4      0x00333220      0x09c1a3c4
0xa055b14:      0x00003432      0x09c1a3b4      0x00343220      0x09c1a3a4
0xa055b24:      0x48003532      0x09c1a394      0x00353220      0x09c1a384
0xa055b34:      0x00003632      0x09c1a374      0x00363220      0x09c1a354
0xa055b44:      0x00003732      0x09c1a344      0x00373220      0x09c1a334
0xa055b54:      0x40003832      0x09c1a324      0x00383220      0x09c1a314
0xa055b64:      0x00003932      0x09c1a304      0x00393220      0x09c1a2f4

The first two lines are the tail end of the good data.  The third line is
where things get messed up.  The corruption data seems to have some pattern
to it, but I have no idea what it might be.


core.25897

This gdb snippet picks up more or less where the above 25897 snippet leaves
off (with some editing).  The corruption data for this core file seems to
have some regularity too:

(gdb) p $x = $1005
$1049 = (struct sdata *) 0xa35a8f0
(gdb) x/100 $x->u.data
0xa35a8f4:      0x00000037      0x09fb2964      0x00372020      0x09fb2944
0xa35a904:      0x00000038      0x09fb2924      0x00382020      0x0a388b24
0xa35a914:      0x00000039      0x0a388ae4      0x00000000      0x00000024
0xa35a924:      0x00000024      0x00000000      0x00000000      0x00000000
0xa35a934:      0x00000919      0x0a44ab38      0x4212e280      0x00000000
0xa35a944:      0x00000000      0x6877202c      0x20686369      0x73207369
0xa35a954:      0x20746e65      0x74206f74      0x73206568      0x00000000
0xa35a964:      0x00000000      0xffffffff      0x00000001      0x00000000
0xa35a974:      0x00000000      0x00000000      0x65736e6f      0x1826d17c
0xa35a984:      0x1826d17c      0x1826d17c      0x394b1aec      0x1826d17c
0xa35a994:      0x00000000      0x1826d17c      0x1826d17c      0x286e23dc
0xa35a9a4:      0x1826d17c      0x1826d26c      0x38273a14      0x582cd6ac
0xa35a9b4:      0x1826d17c      0x1826d17c      0x4828bf50      0x48277028
0xa35a9c4:      0x48277668      0x1826d1ac      0x00000008      0x00000046
0xa35a9d4:      0x00000000      0x1826d17c      0x1826d17c      0x48277e98
0xa35a9e4:      0x48365800      0x0a388ae4      0x00392020      0x0a388aa4
0xa35a9f4:      0x18003031      0x0a388a74      0x00303120      0x0a388a24
0xa35aa04:      0x18003131      0x0a388a04      0x00313120      0x0a3889f4
0xa35aa14:      0x18003231      0x0a3889c4      0x00323120      0x0a3889b4
0xa35aa24:      0x18003331      0x0a388994      0x00333120      0x0a388984
0xa35aa34:      0x18003431      0x0a388964      0x00343120      0x0a3888e4
0xa35aa44:      0x18003531      0x0a3888c4      0x00353120      0x0a3888b4
0xa35aa54:      0x00003631      0x0a388894      0x00363120      0x0a388834
0xa35aa64:      0x18003731      0x0a388824      0x00373120      0x0a3887f4
0xa35aa74:      0x18003831      0x0a3887d4      0x00383120      0x0a3887c4

also:

(gdb) p $x = $1035
$1050 = (struct sdata *) 0xa35a918
(gdb) x/100c $x->u.data
0xa35a91c:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  36 '$'  0 '\0'  0 '\0'  0
'\0'
0xa35a924:      36 '$'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a92c:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a934:      25 '\031'       9 '\t'  0 '\0'  0 '\0'  56 '8'  -85 '' 68
'D' 10 '\n'
0xa35a93c:      -128 '\200'     -30 ' 18 '\022'       66 'B'  0 '\0'  0
'\0' 0 '\0'   0 '\0'
0xa35a944:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  44 ','  32 ' '  119 'w' 104
'h'
0xa35a94c:      105 'i' 99 'c'  104 'h' 32 ' '  105 'i' 115 's' 32 ' '  115
's'
0xa35a954:      101 'e' 110 'n' 116 't' 32 ' '  116 't' 111 'o' 32 ' '  116
't'
0xa35a95c:      104 'h' 101 'e' 32 ' '  115 's' 0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a964:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  -1 '  -1 '  -1 '  -1 '
0xa35a96c:      1 '\001'        0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0' 0 '\0'
0xa35a974:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0
'\0'
0xa35a97c:      111 'o' 110 'n' 115 's' 101 'e'

In the middle of all this is the string "which is sent to the s", which
probably isn't helpful for debugging, but it does sound kind of like an
important clue from some bad mystery novel.



Anyway... a lot of data here.  I don't know if any of it is at all helpful.
Please advise on where I might go from here.  One question: I see in
alloc.c that there is code ifdefed with GC_CHECK_STRING_BYTES.  Presumably
defining this symbol enables additional checks during garbage collection
(how *did* I figure that out?? :-).  Would it be helpful for me to compile
a version with this flag set, given that the crash does happen with some
regularity?  Is an emacs compiled with this symbol defined practical to
use?

On last bit: I'm afraid that I don't have any netnews access at the moment,
so I cannot read the emacs bug newsgroup.  Please respond by email to
mlm@timesten.com.

Thanks,
- Mark





reply via email to

[Prev in Thread] Current Thread [Next in Thread]