emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Crash in titdic-convert with DOS line ends


From: Jason Rumney
Subject: Crash in titdic-convert with DOS line ends
Date: Tue, 05 Feb 2008 01:31:34 +0000
User-agent: Thunderbird 2.0.0.9 (Windows/20071031)

Jason Rumney wrote:
Some of the Big5 encoded files cannot be processed if they have DOS line ends. I haven't yet figured out why. ETZY.tit, PY-b5.tit, TONEPY.tit and ZOZY.tit have this problem, others do not.

Now that I am debugging this, ETZY.tit does not crash Emacs, while 4Corner.tit does. It appears to be a problem with any DOS line ends in a Big5 file that is inserted into a unibyte buffer, but some other condition needs to be present to trigger the crash. But the following shows that there is definitely a problem with DOS line ends in unibyte buffers

;; Evaluate the following 2 forms in *scratch*. The first converts a .tit file to DOS line ends, the second reads ;; it into a unibyte buffer as raw-text in the same way that titdic-convert does.

(with-temp-buffer
 (let ((coding-system-for-read 'cn-big5)
       (coding-system-for-write 'cn-big5-dos))
(insert-file-contents (expand-file-name "CXTERM-DIC/4Corner.tit" (file-name-directory (locate-library "leim-list"))))
   (write-file "/tmp/test.tit")))

(set-buffer-multibyte nil)
(let ((coding-system-for-read 'raw-text))
 (insert-file-contents "/tmp/test.tit"))


;; If Emacs does not crash, note the ^M on the ends of some lines.


When Emacs crashes, it always happens in decode_eol (several levels deep from insert-file-contents), on this line:


>          if (*p == '\r' && p[1] == '\n')


p appears to have overrun the buffer.
(gdb) print p
$35 = (unsigned char *) 0x2707000 <Address 0x2707000 out of bounds>

(gdb) print pbeg
$39 = (
unsigned char *) 0x26f9f30 "# HANZI input table for cxterm\n# Generated from ETZY.cit by cit2tit\n# To be used by cxterm, convert me to .cit format first\n# .cit version 1\nENCODE:\tBIG5\nMULTICHOICE:\tYES\nPROMPT:\t\244\244\244\345\277
\351\244J\241i\255\312\244\321\252`\255\265\241j\n"...

(gdb) print pend
$40 = (
unsigned char *) 0x27043bb "a\264\303\254\341\305`\272\372\255\276\262\360\3
46\262\311`\370\332\r\nvx83\t\272\336\300]\262\360\265_\337F\327E\336\307\353\33
5\r\nvx84\t\272D\263e\304\351\305\370\341\350\277d\306|\253a\306[\311c\366\355\3
66\360\336\363\367\353\371u\325\341\325V\330\371\361q\371\312\r\nvx93\t\272u\263
address@hidden
250\355\275\275\276h\251K\265\301\357~\321\353\323\354\363\274\320g\337\242\332\
341\337\262\341A\342\336\346\352\357\317\340a\355\356\r\nvxa3\t\271\350\324l\r\n
vxa4\t\261\276\250\366\273o\337h\326"...

Some of this looks suspicious, but I don't know enough to say for sure if it is corrupt...

(gdb) print *coding
$41 = {
 id = 10,
 common_flags = 5376,
 mode = 2,
 spec = {
   iso_2022 = {
     flags = 106,
     current_invocation = {112, 51},
     current_designation = {34, 32, 34, 31248},
     single_shifting = 34,
     bol = 41
   },
   ccl = 0x6a,
   utf_16 = {
     bom = 106,
     endian = 112,
     surrogate = 51
   },
   emacs_mule_full_support = 106
 },
 max_charset_id = 0,
 safe_charsets = 0x170f4e4 "\303\277",
 src_multibyte = 0,
 dst_multibyte = 0,
 head_ascii = -1,
 produced = 42123,
 produced_char = 42123,
 consumed = 42123,
 consumed_char = 42123,
 errors = 0,
 error_positions = 0x22,
 result = CODING_RESULT_SUCCESS,
 src_pos = -42123,
 src_pos_byte = -42123,
 src_chars = 42123,
 src_bytes = 42123,
 src_object = 26925060,
source = 0x26fa700 "---+----+----+----+----+----+----+----+\nCOMMENT | (SPACE BAR)", ' ' <repeats 22 times>, "|\nCOMMENT |", ' ' <repe ats 22 times>, "\263\261\245\255", ' ' <repeats 16 times>, "|\nCOMMENT +
", '-' <repeats 21 times>...,
 dst_pos = 1,
 dst_pos_byte = 1,
 dst_bytes = 2000,
 dst_object = 26925060,
destination = 0x26f9f30 "# HANZI input table for cxterm\n# Generated from ETZY .cit by cit2tit\n# To be used by cxterm, convert me to .cit format first\n# .cit version 1\nENCODE:\tBIG5\nMULTICHOICE:\tYES\nPROMPT:\t\244\244\244\345\277\351\
244J\241i\255\312\244\321\252`\255\265\241j\n"...,
 chars_at_source = 1,
 charbuf = 0x80ab40,
 charbuf_size = 16384,
 charbuf_used = 0,
 annotated = 0,
carryover = "\352m\000\000\031]\000\000\226O\000\000\270}\000\000\204c\000\000
\aW\000\000\226x\000\000\000\223\000\000\300`\000\000o\226\000\000\325\203\000\0
00\032\216\000\000\306h\000\000&\207\000\000\"\000\000\000)\000\000",
 carryover_bytes = 0,
 default_char = 32,
 detector = 0,
 decoder = 0x116d3ba <decode_coding_raw_text>,
 encoder = 0x116d3f6 <encode_coding_raw_text>
}





reply via email to

[Prev in Thread] Current Thread [Next in Thread]