[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: eight-bit char handling in emacs-unicode
From: |
Kenichi Handa |
Subject: |
Re: eight-bit char handling in emacs-unicode |
Date: |
Tue, 18 Nov 2003 16:33:15 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <jwvhe12emr3.fsf-monnier+emacs/address@hidden>, Stefan Monnier
<address@hidden> writes:
>> The basic problem is that we don't distinguish a character
>> (code) and a number. So, we introduce a character object
> That's one way to look at the problem.
> Another is to say that the problem is instead that we do not distinguish
> between arrays of chars and arrays of bytes.
I agree that it's possible to grasp the problem in that way,
but I'm not sure which is the better way. Could you explain
WHY yours is better?
[...]
> In Emacs-21 we worked around the problem by arranging for "the
> eight-bit-char that encodes to 192" to be represented by the integer 192, so
> as to avoid having to choose. But with unicode, the 128-255 zone cannot be
> dedicated to eight-bit-char since it's already used up for latin-1, so we
> have to face the problem more directly.
> The places where Emacs-21 still had to choose, we just used heursitics,
> so `concat' will sometimes return a unibyte string, and sometimes
> multibyte string.
> So I think your options 1-3 are better than 4. BTW, your function
> `eight-bit-char' should be named `byte-to-char' instead.
> Which of 1 to 3 is the best is not clear, and maybe we can just live with
> `make-string-unibyte' and `make-string-multibyte'.
I think you mean string-make-unibyte/multibyte, but, for the
current problem, we can't use it because string-make-unibyte
may behave differently in different language environment.
Such a lang. env. that makes iso-8859-1 or Unicode the
highest priority for the character `À' is ok.
(string-make-unibyte (concat '(?a 192))) = "a\300"
But, if some lang. env. prefers such a charset for `À' that
encodes it not to 192 (e.g. Vietnamese VSCII), we fail.
> Note that 1-3 are not mutually exclusive so we can use
> them all.
Yes, but, at least, I really want to avoid "(3) Make a
series of new functions".
---
Ken'ichi HANDA
address@hidden
- Re: BIG5-HKSCS?, (continued)
- Re: BIG5-HKSCS?, Oliver Scholz, 2003/11/14
- Re: BIG5-HKSCS?, Simon Josefsson, 2003/11/13
- eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/13
- Re: eight-bit char handling in emacs-unicode, Oliver Scholz, 2003/11/14
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/14
- Re: eight-bit char handling in emacs-unicode, Oliver Scholz, 2003/11/15
- Re: eight-bit char handling in emacs-unicode, Simon Josefsson, 2003/11/15
- Re: eight-bit char handling in emacs-unicode, Simon Josefsson, 2003/11/14
- Re: eight-bit char handling in emacs-unicode, Alex Schroeder, 2003/11/16
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/17
- Re: eight-bit char handling in emacs-unicode,
Kenichi Handa <=
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Juri Linkov, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21