emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: X11 Compound Text vs ISO 2022


From: Stephen J. Turnbull
Subject: Re: X11 Compound Text vs ISO 2022
Date: Wed, 07 Jul 2010 09:36:41 +0900

James Cloos writes:

 > I think utf8 is the only significant difference between the upstream
 > Xorg spec and the Xfree86 modification.  I vaguely recall the
 > discussions on the xfree86 list(s) when it was introduced (too many
 > years ago, [SIGH]).  The EWMH spec and the UTF8_STRING fromat came
 > about, in part, out of that discussion, IIRC.

As of about 2004, the XFree86 spec was totally bogus (internally
contradictory on the subject of encoding some ISO 8859 coded character
sets), and the XFree86 implementation ignored it anyway in many cases.

 > Emacs does need to limit what it is willing to encode in COMPOUND_TEXT,
 > and to use utf8-in-ctext for everything which is not in the 8859, GB,
 > JISX, KSC, CNS or BIG5 varients libX11 supports.  I'd go a bit further
 > and prefer utf8 over the CJK encodings for characters which are not
 > part of a CJK string.

But that goes against the spec, which AFAIK still provides that in
COMPOUND_TEXT the escape to non-ISO-2022 should only be used for
characters not in the repertoires of the registered charsets:

    Extended segments are not to be used for any character set
    encoding that can be constructed from a GL/GR pair of approved
    standard encodings. For example, it is incorrect to use an
    extended segment for any of the ISO 8859 family of encodings.

I would argue that you have two choices here: consider the whole
string to be Unicode, and used an extended segment for the whole
thing; or consider the string to be pieced together from segments in
approved standard encodings, in which case a character that can be
represented in those encodings should be.

BTW, for the case of MIDDLE DOT using JIS X 0213, the most recent spec
I could find on the web doesn't admit JIS X 0213 (or JIS X 0212 for
that matter).

 > The question, then, is how best to do that?

Wouldn't it be better to avoid use of COMPOUND_TEXT targets?  How many
apps prefer it to UTF8_STRING?  So, for example, when asked for
supported targets Emacs could list UTF8_STRING first.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]