emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: X11 Compound Text vs ISO 2022


From: James Cloos
Subject: Re: X11 Compound Text vs ISO 2022
Date: Tue, 06 Jul 2010 18:30:36 -0400
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

>>>>> "DDLHG" == David De La Harpe Golden <address@hidden> writes:

DDLHG> But anyway, if emacs isn't using one of the character sets listed in
DDLHG> the table in sect. 4/5 of "the" spec [1] or utf-8 as per sect.7,
DDLHG> presumably it's an emacs bug unless emacs has successfully "registered
DDLHG> the encoding with the X consortium" as per sect. 6 (and I don't see
DDLHG> that happening...).

Exactly.  Xorg libX11 supports the what is in that spec (including the utf8 
which was first added by XFree86, but was not added to the upstream spec),
a couple of other charsets "for compatability with Xfree86 3.1" and two sets
which are "used by Emacs, but not backed by ISO-IR".

Xorg's luid app has its own 2022 encoder/decoder which supports a couple
of additional charsets, such as "DEC Special", "DEC Technical", four
KOI8 variations, cp125[012], cp437, cp850 and cp866.  But it does not
use those for COMPOUND_TEXT, only as its internal encoding, much like
Emacs used to do.

DDLHG> Conversely, if emacs is sending a charset that IS listed in the table
DDLHG> in sect. 4/5 or utf-8 as per sect. 7, then libX11 and other apps are
DDLHG> "at fault" if they don't recognise them.

Emacs sends as COMPOUND_TEXT a 2022 encoding which appears to be exactly
what it used to use internally, rather than keeping to the ctext spec.

DDLHG> But err... the spec on freedesktop.org seems a lot older, not even
DDLHG> mentioning utf-8 ???

I think utf8 is the only significant difference between the upstream
Xorg spec and the Xfree86 modification.  I vaguely recall the
discussions on the xfree86 list(s) when it was introduced (too many
years ago, [SIGH]).  The EWMH spec and the UTF8_STRING fromat came
about, in part, out of that discussion, IIRC.

Emacs does need to limit what it is willing to encode in COMPOUND_TEXT,
and to use utf8-in-ctext for everything which is not in the 8859, GB,
JISX, KSC, CNS or BIG5 varients libX11 supports.  I'd go a bit further
and prefer utf8 over the CJK encodings for characters which are not
part of a CJK string.  (As an example, Emacs uses japanese-jisx0213-1
for U+2022 MIDDLE DOT; it would be better to use utf-8 unless the
MIDDLE DOT is in a string which was entered via the Japanese input
method, or LANG is ja_JA, or something of that sort.)

The question, then, is how best to do that?

-JimC
-- 
James Cloos <address@hidden>         OpenPGP: 1024D/ED7DAEA6



reply via email to

[Prev in Thread] Current Thread [Next in Thread]