[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: eight-bit char handling in emacs-unicode
From: |
Kenichi Handa |
Subject: |
Re: eight-bit char handling in emacs-unicode |
Date: |
Wed, 26 Nov 2003 09:07:47 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
In article <jwvfzgcsbuv.fsf-monnier+emacs/address@hidden>, Stefan Monnier
<address@hidden> writes:
>> It seems that you keep of saying that "A does B, thus it's
>> nonsense". But, I'm arguing that "A does C".
> Well, the thing is: I still don't understand what is C.
> From what I understand, you say that C is "a conversion from multibyte
> to a sequence of code-points",
Yes, that what I said.
> but since the output is a unibyte string,
> that restrict it to cases where the code-points can be encoded in 8 bits,
> thus it doesn't sound very generic
Yes. But I thought generic or not is not a point here.
> and I don't see any application for it
> (nor do I see any practical difference with using encode-coding-string
> since the output AFAIK would be the same).
My examples shows that we can't use encode-coding-string.
How can we use encode-coding-string without knowing what
coding system to use? I haven't heard your answer yet.
>> It doesn't make sense because you treat the result as "a
>> unibyte string encoded in Latin-1".
>> It makes sense if you treat the result as "a unibyte string
>> in which each byte represents a sequence of Unicode
>> code-points", doesn't it?
> But each byte can only represent the 0-255 subset of unicode code-points, in
> which case this is equivalent (practically speaking) to latin-1, isn't it ?
Yes. And that covers all characters the user uses in this
case.
>>> It'd make sense if the environment said "latin-1 when you can,
>>> utf-8 otherwise" or something like that, but then we would use
>>> encode-coding-string anyway.
>> It's itself nonsense to have such a coding system.
> I was not thinking of a coding-system, but just some encoding job,
> such as what is done when saving a buffer (where my .emacs does exactly
> that: try latin-1 first and utf-8 if that fails).
Ah, I see. But, my understanding is that
string-make-unibyte/multibyte are designed not to change the
number of characters to make the difference of
unibyte/multibyte transparent in Lisp. That restriction
leads to a case that non-supported characters are handled
incorrectly. But, I think Richard's design policy was that
incorrect handling of non-supported characters is better
than a possibly more disastrous error caused by the change
of number of characters.
>> Do you agree with having string-make-unibyte if it signals an error on
>> non-Latin-1 characters?
> Of course: that's pretty much what I suggested: make-string-unibyte only
> accepts multibyte chars that correspond to "bytes".
I agree with that. But, it just changes the behaviour of
the function on error case. It doesn't change the concept
of what it does.
>>> I just don't know of a concrete case where it makes sense to use
>>> string-make-unibyte.
>> I'll paraphrase my previous example as this:
>> It is perfectly possible to live in such an environment
>> where only the characters U+0000..U+00FF of Unicode is
>> used but only the coding system utf-8 is used.
>> But, I don't claim that the above is a realistic case.
>> Another non-realistic but concrete case is:
>> Use only the charset iso-8859-5 and the encoding CTEXT.
> I don't see any use of string-make-unibyte in your two examples.
Again, I'd like to ask how to use encode-coding-string
without knowing the proper coding-system in each case.
> And "having string-make-unibyte if it signals an error on non-Latin-1
> characters" means that the second example can't be used any more.
In the second case, of course "supported characters" are
what included in the charset iso-8859-5, and
string-make-unibyte should accept them. Again, the result
is the same as encoding by the coding system iso-8859-5, but
we only know about the coding system CTEXT here.
---
Ken'ichi HANDA
address@hidden
- Re: eight-bit char handling in emacs-unicode, (continued)
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/22
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/23
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/23
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/24
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/25
- Re: eight-bit char handling in emacs-unicode,
Kenichi Handa <=
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/26
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/26
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/27
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/30
- Re: eight-bit char handling in emacs-unicode, Richard Stallman, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/25
- Re: eight-bit char handling in emacs-unicode, Richard Stallman, 2003/11/26
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/26
- Re: eight-bit char handling in emacs-unicode, Richard Stallman, 2003/11/27
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/27