eight-bit char handling in emacs-unicode

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

eight-bit char handling in emacs-unicode

From:	Kenichi Handa
Subject:	eight-bit char handling in emacs-unicode
Date:	Fri, 14 Nov 2003 09:47:51 +0900 (JST)
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Simon Josefsson <address@hidden> writes:
> rfc2104.el now works, thanks.  But does the fix really have to
> explicitly mention charsets like iso-latin-1?  Is there no way to
> handle binary octet strings in emacs-unicode?  Preferably in a
> portable way, that works on old Emacs versions and on XEmacs.

>>  This is a typical problem of emacs-unicode in which
>>  characters 128..255 are valid Unicode characters, thus, for
>>  instance, (concat '(?a ?\300)) returns a multibyte string of
>>  `a' and `À'.  But in the current Emacs, it returns a unibyte
>>  string.
>> 
>>  I suspect the similar fix is necessary in several other
>>  places.

> Having a way to deal with data that is a pure single byte, without
> involving coding systems, seems like a rather important thing to me.

I agree with you.  Currently, I can think of these methods:

(1) Perhaps the easiest way.

Check `default-enable-multibyte-characters' or a newly
instroduced variable `byte-as-byte' to decide whether a
integer 128..255 must be treated as a Latin-1 char or a
byte.   So,
(concat '(?a ?\300)) => "aÀ" (multibyte string)
(let ((byte-as-byte t))
  (concat '(?a ?\300))) => "a\300" (unibyte string)

(2) Introduce a new function `eight-bit-char'.

It converts an argument to ascii or eight-bit-char.
(eight-bit-char ?a) => 94
(eight-bit-char ?\300) => 4194240
Then,
(concat '(?a (eight-bit-char ?\300))) => "a\300"

(3) Make a series of new functions (I think it's not good)

concat vs concat-unibyte
string vs string-unibyte
aset vs aset-unibyte

(4) Most drastic way (the cleanest but requires lots of work)

The basic problem is that we don't distinguish a character
(code) and a number.  So, we introduce a character object
(like XEmacs).  The function `character' converts a
character code into the corresponding character object.  The
lisp reader always generate a character object for ?a,
?\300, etc.   So:
 (concat '(?a ?\300)) => "aÀ"
 (concat '(?a #o300)) => "a\300"
 (concat '(?a (character #o300))) => "aÀ"
 (concat '(?a #o300 (character #o300))) => "a\300À"

Note: (character X) == (decode-char 'ucs X)

> It started now, but when I enter a summary buffer it crashed:

> Program received signal SIGSEGV, Segmentation fault.
> 0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591
> 1591                      char_ranges[n_char_ranges++] = c;
> (gdb) bt
> #0  0x081a3c81 in skip_chars (forwardp=1, string=160, lim=36) at syntax.c:1591

I just tried gnus but I couldn't reproduce it.  So, I need
more help.  Could you show me the results of the following?

(gdb) p n_char_ranges
(gbd) p c
(gdb) p string
(gdb) xstring
(gdb) p *$

---
Ken'ichi HANDA
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: BIG5-HKSCS?, (continued)

Prev by Date: Re: BIG5-HKSCS?
Next by Date: Re: [mew-int 01596] Re: windows 1252
Previous by thread: Re: BIG5-HKSCS?
Next by thread: Re: eight-bit char handling in emacs-unicode
Index(es):
- Date
- Thread