[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Reporting UTF-8 related problems?
From: |
Karl Eichwalder |
Subject: |
Re: Reporting UTF-8 related problems? |
Date: |
Tue, 30 Jul 2002 20:58:32 +0200 |
User-agent: |
Gnus/5.090006 (Oort Gnus v0.06) Emacs/21.3.50 (i686-pc-linux-gnu) |
Kenichi Handa <address@hidden> writes:
> „Die Familie Schroffenstein“
>
> I thought that the notation &#NUMBER is for transmitting
> Unicode character of code NUMBER. But, 132 and 147 are
> control codes in Unicode, not any kind of quotings.
&#NUMBERs are so called "character references"; the SGML declaration
defines which are allowed. For HTML you must consult the html.d[e]?cl
file. The crucial section is (HTML 2):
BASESET "ISO Registration Number 100//CHARSET
ECMA-94 Right Part of
Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET 128 32 UNUSED
160 96 32
This basically means: € to Ÿ are unused. The same applies for
HTML 4 (and later fpr XML resp. XHTML):
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with
implementation level 3//ESC 2/5 2/15 4/6"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 UNUSED
[...]
To make the SGML parser happy you can provide a changed declaration:
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with
implementation level 3//ESC 2/5 2/15 4/6"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 4 UNUSED
132 1 "My rising double quote left (low)"
133 14 UNUSED
147 1 "My rising double quote right (high)"
148 16 UNUSED
[...]
Untested, and the result is invalid HTML. If they would announce a
proper HTTP header, it could be okay:
Content-Type: text/html; charset=windows-1252
Andreas Schwab <address@hidden> writes:
> The numbers are supposed to be ISO 8859-1 characters codes. I'd guess the
> page has been written with some broken (a.k.a. W*nd*ws) software (the use
> of *.htm makes this apparent).
Yes, they have "interesting" guidelines online...
Kenichi Handa <address@hidden> writes:
> Ah, I see. I found that windows-125X maps 132 and 147 to
> U+201E and U+201C. So, perhaps those systems (galeon and
> lynx) parse them as U+201E and U+201C. Anyway, how to
> encode them in X selection is their problem and Emacs can't
> do anything about it.
Yes, but once in the X selection I'd like to see Emacs honor them.
The spacing problem also occurs when I try to cut and paste from Markus
Kuhn's demo file
(http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt):
• ‚deutsche
‘
„Anf
ührungszeichen
“
When I insert (C-x RET c utf-8 RET C-x C-f UTF-8-demo.txt RET), things
are correctly displayed (the characters are different):
• ‚deutsche‘ „Anf
ührungszeichen
“
Cut and paste both these examples from Emacs (this mail buffer) to a
UTF-8 xterm doesn't work neither; instead of the quotes I see "-1" and
garbage.
I hope the examples will go through.
--
address@hidden (work) / address@hidden (home): |
http://www.suse.de/~ke/ | ,__o
Free Translation Project: | _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*)
- Reporting UTF-8 related problems?, Karl Eichwalder, 2002/07/28
- Re: Reporting UTF-8 related problems?, Eli Zaretskii, 2002/07/28
- Re: Reporting UTF-8 related problems?, Kenichi Handa, 2002/07/29
- Re: Reporting UTF-8 related problems?, Kenichi Handa, 2002/07/29
- Re: Reporting UTF-8 related problems?, Karl Eichwalder, 2002/07/29
- Re: Reporting UTF-8 related problems?, Kenichi Handa, 2002/07/30
- Re: Reporting UTF-8 related problems?, Karl Eichwalder, 2002/07/30
- Re: Reporting UTF-8 related problems?, Kenichi Handa, 2002/07/30
- Re: Reporting UTF-8 related problems?, Andreas Schwab, 2002/07/30
- Re: Reporting UTF-8 related problems?, Kenichi Handa, 2002/07/30
- Re: Reporting UTF-8 related problems?,
Karl Eichwalder <=
- Re: Reporting UTF-8 related problems?, Karl Eichwalder, 2002/07/30
- Re: Reporting UTF-8 related problems?, Karl Eichwalder, 2002/07/30
- Re: Reporting UTF-8 related problems?, Kenichi Handa, 2002/07/31
- Re: Reporting UTF-8 related problems?, Karl Eichwalder, 2002/07/31
Re: Reporting UTF-8 related problems?, Richard Stallman, 2002/07/29
Re: Reporting UTF-8 related problems?, Eli Zaretskii, 2002/07/28