[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Emacs and UTF-8 locale
From: |
Kenichi Handa |
Subject: |
Re: Emacs and UTF-8 locale |
Date: |
Mon, 7 Jan 2002 10:01:17 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
Dave Love <address@hidden> writes:
>>>>>> Kenichi Handa writes:
>> It seems that it doesn't have a major problem,
> I hope not, because I've basically used your customization hooks or
> similar ones and done the sort of things you'd talked about at some
> time!
I may have written various ideas but please don't assume
that they all works fine. :-p
>> but I found one problem related to handling unibyte case.
> I didn't expect it to do anything sensible with unibyte, but if
> there's an easy to improve it, that would be fine.
>> If unify-8859-on-decoding-mode is on, for instance, in
>> latin-2 lang. env., 8859-2 characters files are decoded into
>> latin-iso8859-1 and mule-unicode-0100-24ff. But, C-q XXX
>> still inserts latin-iso8859-2 characters.
> Yes. I'm not sure that should change, but the relevant primitives
> could now use `translation-table-for-input'. It wasn't the sort of
> thing I could control in user-level customization anyway, without
> kludging it with a post-command hook.
C-q (quoted-insert) assumes that a code in the range
0240..0377 is a code for single-byte charset, and convert it
to multibyte code by calling unibyte-char-to-multibyte.
This function does the conversion by
nonascii-translation-table or nonascii-insert-offset.
So, once they are set correctly, C-q XXX can insert an
appropriate character.
>> And, when we paste mule-unicode-0100-24ff characters into
>> unibyte buffer, or paste unibyte string into a multibyte
>> buffer, they are not correctly converted.
> What would be correct?
As far as characters are in the range what the current
language environment support in unibyte mode, they should be
correctly converted. For instance, in Latin-2 lang. env.,
all characters in latin-2 charset should be handled
correctly, i.e. A-ogonek in mule-unicode-0100-24ff <=> 0xA1.
And, that kind of conversion can be done only by setting
nonascii-translation-table correctly in each language
environment.
> Is general Unicode text any different to, say, JISX-based
> Japanese in that respect?
Of course, in Japanese lang. env. or in UTF-8 lang. env.,
such a conversion between unibyte and multibyte is
meaningless and doesn't work. And anyway, in such a
lang. env., peaple doesn't expect it to work well.
What we need is to make it work well only in
single-byte-charset-based lang. env.
---
Ken'ichi HANDA
address@hidden