[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [mew-int 01585] Re: windows 1252
From: |
Kenichi Handa |
Subject: |
Re: [mew-int 01585] Re: windows 1252 |
Date: |
Fri, 7 Nov 2003 16:13:45 +0900 (JST) |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) |
I'm sorry for the late response on this thread.
I at first want to clarify these things:
(1) windows-1252
This is actually not a charset but a coding system in
Emacs. When Emacs reads a file by this coding system, it
decode each byte into one of these character sets:
ascii, latin-iso8859-1, mule-unicode-0100-24ff
(2) ctext (alias of compound-text)
On conversion, it works not fully compatible with the
specification of X Compound Text because it encodes any
Emacs characters while using an designation sequence for
private character sets (please note that all Emacs charasets
have a iso-final-char). So, Big5 characters are preceded by
ESC $ ( 0 or 1, mule-unicode-0100-24ff characters are
preceded by ESC - 1.
(3) ctext-with-extensions (alias of compound-text-with-extensions)
It can handle several kinds of "extended segment". On
decoding, it handles ESC % / N M L ... ^b for what listed in
ctext-non-standard-encoding-alist, and ESC % G ...ESC % @
for UTF-8. On encoding, it does two-path encoding; at first
by `compound-text', then re-encode what are encoded by a
designation sequence listed in
ctext-non-standard-designations-alist using the "extended
segment". Currently only ESC $ ( 0 and ESC $ ( 1 are
listed. Thus only Big5 are encoded using the "extended
segment".
As to the Mew case, I think the following is good.
When it runs under the current Emacs, keep using ctext but
add a coding tag to the file. Emacs should be able to
encode/decode all Emacs characters.
When it runs under emacs-unicode version, on writing the
file, if all the characters can be encoded by ctext, keep
using it. If not (because, in emacs-unicode, some character
doesn't belong to any charset that has iso-final-char), use
utf-8. And in both cases, add a coding tag. On reading,
check the coding tag at first. If no coding tag, read by
ctext, otherwise, read by the coding system specified in the
tag.
By the way,
> The one-and-only coding-system which, I found, meets the requirements
> above is 'ctext.
I think iso-latin-1-with-esc also meets your requirements.
---
Ken'ichi HANDA
address@hidden
- Re: [mew-int 01585] Re: windows 1252, (continued)
- Re: [mew-int 01585] Re: windows 1252, Stephen J. Turnbull, 2003/11/04
- Re: [mew-int 01585] Re: windows 1252, Stefan Monnier, 2003/11/04
- Re: [mew-int 01590] Re: windows 1252, 山本和彦, 2003/11/04
- Re: [mew-int 01590] Re: windows 1252, Stefan Monnier, 2003/11/04
- Re: [mew-int 01590] Re: windows 1252, Stephen J. Turnbull, 2003/11/04
- Re: [mew-int 01593] Re: windows 1252, 山本和彦, 2003/11/04
- Re: [mew-int 01593] Re: windows 1252, Stephen J. Turnbull, 2003/11/05
- Re: [mew-int 01593] Re: windows 1252, Kenichi Handa, 2003/11/07
- Re: [mew-int 01593] Re: windows 1252, Kenichi Handa, 2003/11/07
- Re: [mew-int 01597] Re: windows 1252, 山本和彦, 2003/11/07
- Re: [mew-int 01585] Re: windows 1252,
Kenichi Handa <=
- Re: [mew-int 01596] Re: windows 1252, 山本和彦, 2003/11/10
- Re: [mew-int 01596] Re: windows 1252, Kenichi Handa, 2003/11/10
- Re: [mew-int 01596] Re: windows 1252, Stephen J. Turnbull, 2003/11/12
- Re: [mew-int 01596] Re: windows 1252, Kenichi Handa, 2003/11/12
- Re: [mew-int 01596] Re: windows 1252, Stephen J. Turnbull, 2003/11/13
- Re: [mew-int 01596] Re: windows 1252, Kenichi Handa, 2003/11/13
- Re: [mew-int 01596] Re: windows 1252, Stephen J. Turnbull, 2003/11/14
- Re: [mew-int 01596] Re: windows 1252, Kenichi Handa, 2003/11/14
- Re: [mew-int 01596] Re: windows 1252, Eli Zaretskii, 2003/11/13
- Re: [mew-int 01596] Re: windows 1252, Kenichi Handa, 2003/11/13