Re: eight-bit char handling in emacs-unicode

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From:	Kenichi Handa
Subject:	Re: eight-bit char handling in emacs-unicode
Date:	Wed, 26 Nov 2003 09:07:47 +0900 (JST)
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvfzgcsbuv.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:

>>  It seems that you keep of saying that "A does B, thus it's
>>  nonsense".  But, I'm arguing that "A does C".

> Well, the thing is: I still don't understand what is C.
> From what I understand, you say that C is "a conversion from multibyte
> to a sequence of code-points",

Yes, that what I said.

> but since the output is a unibyte string,
> that restrict it to cases where the code-points can be encoded in 8 bits,
> thus it doesn't sound very generic

Yes.  But I thought generic or not is not a point here.

> and I don't see any application for it
> (nor do I see any practical difference with using encode-coding-string
> since the output AFAIK would be the same).

My examples shows that we can't use encode-coding-string.
How can we use encode-coding-string without knowing what
coding system to use?  I haven't heard your answer yet.

>>  It doesn't make sense because you treat the result as "a
>>  unibyte string encoded in Latin-1".

>>  It makes sense if you treat the result as "a unibyte string
>>  in which each byte represents a sequence of Unicode
>>  code-points", doesn't it?

> But each byte can only represent the 0-255 subset of unicode code-points, in
> which case this is equivalent (practically speaking) to latin-1, isn't it ?

Yes.  And that covers all characters the user uses in this
case.

>>>  It'd make sense if the environment said "latin-1 when you can,
>>>  utf-8 otherwise" or something like that, but then we would use
>>>  encode-coding-string anyway.

>>  It's itself nonsense to have such a coding system.

> I was not thinking of a coding-system, but just some encoding job,
> such as what is done when saving a buffer (where my .emacs does exactly
> that: try latin-1 first and utf-8 if that fails).

Ah, I see.  But, my understanding is that
string-make-unibyte/multibyte are designed not to change the
number of characters to make the difference of
unibyte/multibyte transparent in Lisp.  That restriction
leads to a case that non-supported characters are handled
incorrectly.  But, I think Richard's design policy was that
incorrect handling of non-supported characters is better
than a possibly more disastrous error caused by the change
of number of characters.

>>  Do you agree with having string-make-unibyte if it signals an error on
>>  non-Latin-1 characters?

> Of course: that's pretty much what I suggested: make-string-unibyte only
> accepts multibyte chars that correspond to "bytes".

I agree with that.  But, it just changes the behaviour of
the function on error case.  It doesn't change the concept
of what it does.

>>>  I just don't know of a concrete case where it makes sense to use
>>>  string-make-unibyte.

>>  I'll paraphrase my previous example as this:

>>    It is perfectly possible to live in such an environment
>>    where only the characters U+0000..U+00FF of Unicode is
>>    used but only the coding system utf-8 is used.

>>  But, I don't claim that the above is a realistic case.

>>  Another non-realistic but concrete case is:

>>    Use only the charset iso-8859-5 and the encoding CTEXT.

> I don't see any use of string-make-unibyte in your two examples.

Again, I'd like to ask how to use encode-coding-string
without knowing the proper coding-system in each case.

> And "having string-make-unibyte if it signals an error on non-Latin-1
> characters" means that the second example can't be used any more.

In the second case, of course "supported characters" are
what included in the charset iso-8859-5, and
string-make-unibyte should accept them.  Again, the result
is the same as encoding by the coding system iso-8859-5, but
we only know about the coding system CTEXT here.

---
Ken'ichi HANDA
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: eight-bit char handling in emacs-unicode, (continued)

Prev by Date: Re: Changes to Texinfo DTD
Next by Date: Re: A new online publishing tool for Texinfo documents.
Previous by thread: Re: eight-bit char handling in emacs-unicode
Next by thread: Re: eight-bit char handling in emacs-unicode
Index(es):
- Date
- Thread