[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: eight-bit char handling in emacs-unicode
From: |
Kenichi Handa |
Subject: |
Re: eight-bit char handling in emacs-unicode |
Date: |
Fri, 21 Nov 2003 15:27:37 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <jwvzneqwbo3.fsf-monnier+emacs/address@hidden>, Stefan Monnier
<address@hidden> writes:
>> Yes, but it doesn't mean it is conceptually the same as
>> encode-coding-string. The result of string-make-unibyte
>> should still be regarded as a sequence of character, but the
>> result of encode-coding-string is a sequence of byte.
> Why/when is the distinction meaningful (given the fact that it
> can only be used meaningfully with 8bit coding-systems where the
> distinction seems more philosophical than anything else) ?
It is perfectly possible to live in such an environment
where only the charset iso-8859-1 is used but only the
coding system utf-8 is used. In this environment, the
results of encode-coding-string and string-make-unibyte are
of course not the same, but still both operations are
meaningful.
>> Here exists an ambiguity of a unibyte string.
>> The number 192 can be regarded as:
>> (1) just a number, a byte
>> (2) a code point of some character set.
>> (3) a character code
> But the second case is only possible for 8bit character sets, right?
Yes. But, as I wrote above, it doesn't mean that we are
restricted to simple 8bit-oriented coding-systems.
> Until now, I always thought that Emacs only dealt with
> - byte streams representing encoded sequences of code points: case 1.
> - sequences of internal character codes (internally encoded in emacs-mule
> or unicode depending on the branch you use): case 3.
> Is there any place where we deal with sequences of code points of external
> charsets really (other than in the degenerate case where such a sequence
> is indistinguishable from case 1, maybe).
I'd like to repeat that although we don't have such an
environment now, it doesn't mean it is impossible to assume
such environment.
>> A unibyte string can contain (1) and (2) without
>> distinguishing them, but a multibyte string can contain (1)
>> and (3) while distinguishing them.
> Can multibyte strings distinguish the cases (1) and (3) for integer 97 and
> character `a' ?
Good point. Of course no. I dared not mention that to make
the discussion simpler.
---
Ken'ichi HANDA
address@hidden
- Re: eight-bit char handling in emacs-unicode, (continued)
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/17
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Juri Linkov, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode,
Kenichi Handa <=
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/22
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/23
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/23
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/24
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/26
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/26