[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: eight-bit char handling in emacs-unicode
From: |
Kenichi Handa |
Subject: |
Re: eight-bit char handling in emacs-unicode |
Date: |
Fri, 21 Nov 2003 09:41:47 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <jwvptfp139w.fsf-monnier+emacs/address@hidden>, Stefan Monnier
<address@hidden> writes:
>> I see. Apart from the design itself, I agree that it's difficult to
>> introduce a new type. But, when I discussed with Richard about the
>> Character type object a few year ago, he was not that negative provided
>> that it gives sure improvement.
> Sounds about right to me: we have one free tag that we could use for chars
Yes, and as that is the last free tag, I still hesitate to
consume it for the Character object.
>> Then, we can't use make-string-unibyte for the current case
>> because, in emacs-unicode, (concat '(?a 192)) returns a
>> multibyte string whose second element is A-grave, not an
>> eight-bit-char. Am I missing something?
> Well, obviously we need to make it accept this case (i.e. accept both the
> latin-1 192 and the eight-bit-char 192).
Then, I see your intention. But, isn't the semantics of
such a function very weird?
>>> To do what your string-make-unibyte does you should use
>>> `encode-coding-string' where the coding system is passed explicitly.
>> Those are conceptually different things (I remember the
>> similar discussion we had a while ago).
>> encode-coding-string does:
>> char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence
>> --CES--> encoded-byte-sequence
>> string-make-unibyte does:
>> char-sequence --CCS--> code-point-sequence
>> --concat--> code-point-sequence
>> These two yield the same result only when CCS support all
>> chars in "char-sequence" and CES is stateless
>> (e.g. iso-latin-1) and .
> You lost me here (I'm a poor soul whose doesn't know much outside of the
> latin-1 world).
CCS: Coded Character Set
CES: Character Encoding Scheme
coding-system of Emacs: Set of CCSs and CES.
iso-latin-1: CCSs are ascii and latin-iso8859-1,
CES is 8-bit version of ISO-2022
iso-2022-jp: CCSs are ascii, japanese-jisx0208, ...
CES is 7-bit version of ISO-2022
> I thought that string-make-unibyte only behaves meaningfully for
> "normal 8bit coding-systems" such as latin-1.
Yes, but it doesn't mean it is conceptually the same as
encode-coding-string. The result of string-make-unibyte
should still be regarded as a sequence of character, but the
result of encode-coding-string is a sequence of byte.
Here exists an ambiguity of a unibyte string.
The number 192 can be regarded as:
(1) just a number, a byte
(2) a code point of some character set.
(3) a character code
A unibyte string can contain (1) and (2) without
distinguishing them, but a multibyte string can contain (1)
and (3) while distinguishing them.
---
Ken'ichi HANDA
address@hidden
- Re: eight-bit char handling in emacs-unicode, (continued)
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/17
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Juri Linkov, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode,
Kenichi Handa <=
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/22
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/23
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/23
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/24
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/25