[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#4051: Character Soup
From: |
Juri Linkov |
Subject: |
bug#4051: Character Soup |
Date: |
Thu, 06 Aug 2009 00:09:20 +0300 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (x86_64-pc-linux-gnu) |
The coding system for the buffer with the Latin-1 character á in
the Cyrillic KOI8 language environment is detected as Chinese gb2312.
How funny!
I noticed this while reporting the bug#4037 that was sent by message.el
with charset=gb2312. Mail readers incorrectly display this message due
to ugly fonts associated with gb2312 (this is a separate problem).
I think it would be more natural to encode this as Latin-1 (in this
particular case) or generally UTF-8 - the universal coding specially
designed for mixing different scripts.
The easiest way to reproduce this problem:
1. emacs -Q
2. C-x RET l Cyrillic-KOI8
3. C-x 8 ' a
4. C-x C-s
5. File to save in: /tmp/file
After that the prompt says:
Select coding system (default chinese-iso-8bit):
and the buffer `*Warning*' contains:
These default coding systems were tried to encode text
in the buffer `file':
(cyrillic-koi8-unix (192 . 225))
However, each of them encountered characters it couldn't encode:
cyrillic-koi8-unix cannot encode these: á
Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
gb2312 utf-8 euc-jis-2004 euc-jp windows-1258 viscii
iso-2022-jp-2004 cp862 iso-8859-16 hp-roman8 next mac-roman cp437
cp865 cp861 cp860 cp858 cp857 cp852 cp850 windows-1254 windows-1252
windows-1250 iso-8859-15 iso-8859-14 iso-8859-10 iso-8859-9
iso-8859-4 iso-8859-3 iso-8859-2 gb18030 gbk hz-gb-2312 utf-7
iso-8859-1 utf-16 utf-16be-with-signature utf-16le-with-signature
utf-16be utf-16le iso-2022-7bit utf-8-auto utf-8-with-signature
eucjp-ms vietnamese-tcvn vietnamese-viqr vietnamese-vscii
japanese-shift-jis-2004 japanese-iso-7bit-1978-irv ibm1047
utf-7-imap utf-8-emacs
I already figured out how to fix this problem for message.el using
(setq mm-coding-system-priorities (cons 'utf-8 mm-coding-system-priorities))
But as shown by the test case above this is a general problem.
--
Juri Linkov
http://www.jurta.org/emacs/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#4051: Character Soup,
Juri Linkov <=