Re: Emacs and UTF-8 locale

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs and UTF-8 locale

From:	Kenichi Handa
Subject:	Re: Emacs and UTF-8 locale
Date:	Mon, 7 Jan 2002 10:01:17 +0900 (JST)
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.1.30 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

Dave Love <address@hidden> writes:
>>>>>>  Kenichi Handa writes:
>>  It seems that it doesn't have a major problem,

> I hope not, because I've basically used your customization hooks or
> similar ones and done the sort of things you'd talked about at some
> time!

I may have written various ideas but please don't assume
that they all works fine.  :-p

>>  but I found one problem related to handling unibyte case.

> I didn't expect it to do anything sensible with unibyte, but if
> there's an easy to improve it, that would be fine.

>>  If unify-8859-on-decoding-mode is on, for instance, in
>>  latin-2 lang. env., 8859-2 characters files are decoded into
>>  latin-iso8859-1 and mule-unicode-0100-24ff.  But, C-q XXX
>>  still inserts latin-iso8859-2 characters.

> Yes.  I'm not sure that should change, but the relevant primitives
> could now use `translation-table-for-input'.  It wasn't the sort of
> thing I could control in user-level customization anyway, without
> kludging it with a post-command hook.

C-q (quoted-insert) assumes that a code in the range
0240..0377 is a code for single-byte charset, and convert it
to multibyte code by calling unibyte-char-to-multibyte.
This function does the conversion by
nonascii-translation-table or nonascii-insert-offset.

So, once they are set correctly, C-q XXX can insert an
appropriate character.

>>  And, when we paste mule-unicode-0100-24ff characters into
>>  unibyte buffer, or paste unibyte string into a multibyte
>>  buffer, they are not correctly converted.

> What would be correct?

As far as characters are in the range what the current
language environment support in unibyte mode, they should be
correctly converted.  For instance, in Latin-2 lang. env.,
all characters in latin-2 charset should be handled
correctly, i.e. A-ogonek in mule-unicode-0100-24ff <=> 0xA1.

And, that kind of conversion can be done only by setting
nonascii-translation-table correctly in each language
environment.

> Is general Unicode text any different to, say, JISX-based
> Japanese in that respect?

Of course, in Japanese lang. env. or in UTF-8 lang. env.,
such a conversion between unibyte and multibyte is
meaningless and doesn't work.  And anyway, in such a
lang. env., peaple doesn't expect it to work well.

What we need is to make it work well only in
single-byte-charset-based lang. env.

---
Ken'ichi HANDA
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Emacs and UTF-8 locale, Dave Love, 2002/01/01
- Re: Emacs and UTF-8 locale, Dave Love, 2002/01/01
- Re: Emacs and UTF-8 locale, Dave Love, 2002/01/01
- Re: Emacs and UTF-8 locale, Kenichi Handa <=

Prev by Date: Re: menubar segfault
Next by Date: Re: Missing syntax-table functionality
Previous by thread: Re: Emacs and UTF-8 locale
Next by thread: Re: code-pages.el
Index(es):
- Date
- Thread